 Alrighty. Welcome. Just before lunch. So telemetry. We're going to go look at telemetry as a system, as a system that looks at telemetry from an application perspective, from a network perspective, but also from a business perspective, and then see how telemetry is really realized as a system because that's requiring a combination of a standardization effort and a tooling and open source implementation effort and trying to spring in these two worlds together. Today it's still relatively fragmented, but I think we're evolving to something that is more of a system and hence usable as a system. Taking a couple of steps back. If you're talking about monitoring tracing, most of the people say that's kind of, yeah, it's at the tail end of innovation. Is that really a target for radical innovation where something really is changing? Well, historically maybe not so much, but I think we're seeing more and more change in that particular area. So let's start with a network. I'm a networking guy. That's kind of the foundation of many things. And if we look at the network and what we've been doing from a networking perspective when it comes to monitoring, in most of the cases that was command line interface, SNMP, syslog for the last 25, 30 years. Hardly anything ever changed in that whole area. And yeah, well, it's kind of boring but required. And we've been evolving that a little bit like, yeah, well, but it's still too slow. You hardly get all the state that you want from the device. It's very device specific. It's vendor specific. It's really hard to operationalize. Because it required, yeah, you need to go and ultimately parse output and that output was really device specific. And we grew that mess. We grew that mess with additional raft of protocols being added to this overall mix. NetFlow IP fix. You have active measurements with one way or two way active measurement protocols. So the evolution of Ping and Trice. Then you have all kinds of active protocols that extract state from the device and push per subscriber state into the device with all the AAA protocols like radius, TACX, Diameter, you name it. And well, I'm also responsible and kind of for a part of growing that mess with adding metadata, operational metadata to every single customer packet out there within C2OAM. A journey that we started in 2015 in the IETF so that we have entire traceability of all traffic as it traverses the network. So really hard to operationalize, but we have to go get that right because we need to go and automate this overall thing if we want to really use this thing as a system where we combine application, business and network layer telemetry. So something that we've done over the past three to four years as a networking industry is we started to free the data out of the individual devices and make them more accessible than they were before. More accessible than what you had with SNMP, CLI, syslog, you name it. So deliver it to the right people and deliver it as a volume to the right people so that you can go and do further analysis and processing on the whole thing. So what is telemetry? I think telemetry was first coined as a term by NASA. But it is really an automated communication process by which measurements and other data are collected at a remote inaccessible point typically and then transmitted to receiving equipment for monitoring. So it's both the collection part as well as the transfer part that mattered from a telemetry perspective. And I think that's something to keep in mind. And by now, well, telemetry is a buzzword in the industry because telemetry started to become sexy. As soon as we can start to go and combine all these sources, you can really munch on the data, you can run analytics on the data in a more sophisticated way than we were able to go do before. What is really new is how we can go and consume that data. And thanks to model-driven telemetry, I think the data became consumable. What we've changed with telemetry over the course of the last three to four years is we're pushing data out. We're streaming data out. We're no longer polling like what we had with SNMP. And we built tool chains around model-driven telemetry so that the whole thing became far more consumable. And that model-driven nature also made the overall thing subject to a far greater level of automation. So that's really what changed. And let's go and look at what the problem was like and what it is still like to some extent. Just imagine you want to go and get the state of an interface in a particular device like a router. One way you CLI, show IP interface, and you see something like Ethernet or Gigabit Ethernet zero is up and you would parse that. You can also look at this thing from what Syslog is going to go spit out. Then it's no longer interface, but it's something like GI for Gigabit Ethernet four slash one, which is some numbering scheme that some router would use. And you need to go and parse that and interpret that and put semantic meaning to the whole thing. You can go further IF index if you would go with querying the whole thing over SNMP. There is also IF index, but IF-index if you would have the same thing represented in Yang. Similar, but not the same. And you can go further. If you look at NetFlow IP fix, then it's now even more context around this interface because if you have a flow, you have an incoming and an egress interface. More context and you see where I'm going, right? So if you look at the whole thing from a TechEx perspective, it's a port. Another AAA protocol radius, it's a NAS port ID or it's a NAS port and it's a NAS port ID in diameter. So big mess, but yeah, how do you solve that problem? What we typically ended up doing is you have a mediation system that understands all these different spellings of what an interface is. And then, well, you updated that as another box came in. What would be much nicer and that's what the industry started off doing the past couple of years is to come up with one data model so that we have one single semantic representation of the data. So what really matters is the semantic. And along with semantic, you need to understand, so that's the what. You also need to understand the how and when you retrieve data. So you need to understand, well, the point in time that you gathered the data and you need to understand where you gathered the data from. Back to the original telemetry description, right? It's the data gathering process per se and the transfer. So you need to understand where it's coming from if you need to go and really reason about that data because every single router will give you a piece of the picture and you need to understand the location of that data to piece the picture, the bigger picture together about Anton at Wake State. So the data models define what you stream, but tools really determine how you consume the data. And that's something that I think from a standards perspective, we ignored early on when we started to define Yang models from an operational perspective or even from a config model perspective. People put out the standard, but nobody actually thought about like, yeah, somebody wants to go and use that data. How do you make it easy to use? You need to build a tool chain around that. And that's what open config really brought to the table, right? Because they've built tooling along with the specification of the data. And if you look at the open source tooling, yeah, there's been a load of work around building tool chains for consuming building Yang or Yang models. P-Yang is probably the most widely used to go and verify Yang models, to go and represent Yang models. Here I'm showing just an instance of printing out the ITF interface Yang model. There is the Yang catalog.org that is pretty the go-to place for every single Yang model that ever got created. If you want to look and search for a Yang model and try to understand whether there's already some existing art out there that you want to go and latch on to as opposed to creating your own Yang model, the Yang catalog is really your friend. And it's there and it's been used. There is tools to go and consume operational data, well, put into Yang so that you can more easily consume it as an operator. One example that I'm showing here is the advanced networking or advanced net convex flora, sorry, A and X, which allows you to go and subscribe to the telemetry steam of a router using GRPC, using GNMI, so you can retrieve that data and then, well, munch on it, show it, and you can even have kind of subscription-based things that stream to the router in an ongoing way. So understand what's going on. A and X is one example to retrieve the data. If you really want to go and build your own tool chains, historically you've used tools like Pipeline, but recently there is an open source plug-in now available for Telegraph, which means you can go and take streaming telemetry in and then get it into influx, visualize the whole thing in, say, Grafana or the likes. So we've enhanced existing tools to go and deal with streaming telemetry in an ongoing way. If you're coming from the old world and need to go and jump into a more consistent, semantically defined world with Yang, yeah, there is tools like the telemetry data mapper, where you take an OID and can find the similar thing on the Yang side. So an IF index is now an, well, ID slash interface or whatever, so that you can go and create that mapping and you can really even look for regular words out there if you're trying to go and understand which Yang model might have the information that you're looking for, the telemetry data mapper is your friend. And that goes on, right? So there's more tools out there. Just simple Google search will reveal far more tools that help building these tool chains. And so we've defined, if you look a little bit at a scorecard, we defined quite well the specification. There is tooling being built around that. There is OSS reference code. Some of the organizations still need to go and come along building these models because there is a load of vendor specific models out there. Open config, try to go and take a step at harmonizing these things. The ITF is still coming up to speed with really building, say, operations models for all the things. They've been very strong on config. They're following up on the operational side of things. One layer up. So we looked at monitoring a network element. How about monitoring and understanding what your traffic is doing? And one way to do that and the historic only way to go do that was active OAM. So inserting traffic into the network to understand what network will likely do in your network. Ping twice, the most frequently used things out there. More recently, a couple of years ago, we started to define one way and two way active measurement protocols. Based idea is have a control channel between the two ends and this control channel will pretty much set up a measurement connection. Configure that measurement connection, conduct the measurement and then spit out the data. Has been done. The message I want to send out here is it's not only the specification that we've done but there is also open source implementations that correspond to that. So you can really quickly adapt an environment that might not even support OAM or TAM by, well, bringing that on. And I did that as an experiment like showcasing that in something like five minutes, you can build a little container that runs your T-Vamp, bring it onto a router or switch, boot it up and you have that feature up and available. Despite the fact that your vendor might not have supported that day one. Great. Now, we've done the observation part. We've done the active monitoring part with Ping, Trace, T-Vamp, OAM, what have you. But how about life user traffic? Because probe traffic might not be really handled the same way as the user traffic, right? Ping and Trace, they might be process switched while the user traffic goes through the fast path. So how do you handle that one? That is a problem that we're now addressing with what the ITF calls in C2 OAM or piggybacked OAM information onto the live customer traffic. I started that effort in 2015 in the ITF in Berlin those days. So what does in C2 OAM do? It collects information about the path that you traverse or that the packet traverses in the network while it traverses the network. So it assembles information into the packet as metadata like incoming interface, outgoing interface, node ID, timestamp, buffer occupation. But also things like you might want to go and put in pieces of a cryptographic key to go and prove that the packet actually visited a particular node so that you have proof of transit there. We started that concept, the packet and the data fields themselves or the data formats that have been defined in a completely transport independent way. So there is one way to go and define the data and how you carry the data. And then there is operation models defined again in Yang, there is export defined and there is a raft of in-flight specifications on how you transport the metadata. Because every single protocol has a slightly different way to carry metadata. In IPv6 is extension headers, in certain cases you need to go and slot in another header in sequence. So there is different ways to go and transport that data. And again, what we've done early on when we started the journey from a specification perspective, we put an open source reference implementation out there in fd.io. So fastdata.io in an implementation in the vector packet processor VPP. More recently, guys in the University of Liège in Belgium came up with an implementation for the Linux kernel. So the 412 kernel has a very, very, very complete implementation in C2OAM by now. It's a very recent one, it's actually more advanced than the VPP implementation that we have in FIDO VPP. Because, well, the FIDO VPP implementation implements an earlier version of the draft, we're trying to go and renovate that. There's a couple of people that also put that into silicon, so all evolving. And at the same time, there is an application in Open Daylight to go and manage the overall thing. So again, the observation is standards, open source and tooling goes hand in hand. If you just do one, it's very unlikely that you're getting adoption. If you look at the overall scorecard, we're starting, we're here a little earlier in the overall game, right? So we started off with cranking out the standards that are in flight, and standardization takes time, and it's supposed to take time because we want to build something that lasts. At the same time, we created reference codes so that people can go and start to play with that. And evaluate and understand what's working and what's not working. And every single hackathon we do at the IETF gives us feedback on this nuance, doesn't really work that well. Let's go and tweak the standard while it's still in flight. Moving up. So we've done the networking piece, but if you look at the application level piece and look at application level tracing, the questions that are application people are asking themselves are very, very similar to what networking people would ask themselves. Like, why am I request slow? Why is a particular database look up slow? Where does it get stuck? Is it really that the database is slow? Is it that the network that connects me to the database is slow? Or is it the host stack that might be the bump in the wire here? We don't really know, right? We need to go and figure that one out. We're trying to go and understand where the bottleneck is. And what you've historically done from an enterprise perspective or an application developer perspective, you were given a tool suite that helps you trace your individual calls through the system. And that tool suite quite commonly was somewhat associated with either the programming language, the framework that we're using, so that you were marrying yourself to a particular set of tools from one or the other vendor. Manual instrumentation, and once you instrumented your code, you were married to that particular tool suite forever. Yeah, so, you as a developer, you typically had to manually instrument your device, and then you were using some library, and that library was relying on a runtime, and that runtime was exporting the data into some tracing back end so that you could see the result. And that was typically a vertically integrated stack until recently. So the question was, can there be standards? Yes, there can be standards. And the journey started with two open source projects, open tracing, a CNCF project originally, and openCensus, another CNCF project. Open tracing, just try to go and create an open API. A tracing API so that I can go and instrument my code using the open tracing API, and then somebody else can go and implement that back end. And the back end could be something like, well, what you have, multiple people implement that back end. Jaeger implements that as an open source project, but also Lightstab, well, not a surprise. Ben Seigelman of Lightstab started or co-started open tracing, but in standard, there's multiple frameworks that implement the back end. So you can switch the back end. You certainly have a degree of freedom, but it was mostly the API that open tracing specified. There was another open source project called openCensus that did very much the same thing, but they went one step further. They not only specified the API, but they also specified the additional set of libraries that you need to go and instrument the overall thing and how things get exported. So they implemented the back end along with the API. Two projects in CNCF pretty much doing the same thing. And there's also, again, been a set of exporters available for the common frameworks. There is another third thing, and that's, well, an effort that always the right bower from Dynatrace is shepherding in W3C where things are specified and how you represent tracing information on the individual wire. Now, you can argue two projects in CNCF doing similar things. Does that make sense? Well, people understood that it does not make too much sense creating competition in a domain where you ultimately want to go and create something that is, well, usable for everybody. So open tracing and openCensus came together and said, well, we're going to go and really merge, i.e., we're going to build something new and we're going to go deprecate the past. So it's not like we merge and then we have a third one and they all progress in parallel. No, right? It's not like yet another standard. They really busily work towards deprecating the past once they have something out. They launched in May this year. They have something out in Java and they are hoping to get more language bindings out over the year so that by the end of the year you can readily start to really deprecate both open tracing and openCensus but go forward with open telemetry because open telemetry carries forward, well, in almost all these dimensions. So you have the trace information, you have the trace API and you're also implementing a back end. So it goes all the way an entire stack that you ultimately want. They implement one particular tracing model and that particular tracing model is, well, it was the same by the way between open tracing and openCensus and most of the vendors out there have a very similar tracing model. So you have a trace and the trace is composed out of a hierarchy of spans. So the root span would be messages and then you have a couple of child spans like first authentication, the request goes into a cache get. You don't really find the data in the cache. You have a remote RPC reaching out to the other side. You get the data and then you populate the cache first. These individual traces, they have dependencies, they have trace IDs, the span IDs, they all have dependencies so they can go and follow a particular thing through the system as progress is on. And a tool that does that for you is, for instance, Jagger, open source and CNCF. So you can look at all the trace points as it traverses through the system. And you can get all the way down to how long is my particular HTTP request from here to there, take it. So let's assume that that HTTP request is taken something like, and that's what it shows here in that example, 65.72 milliseconds. Where do I spend that time? None of the application level tracing mechanisms that we have out there today tells you that because your visibility is limited up to layer five. That's it. Look on top down, layer five is the end. The HTTP request, you can't go any more granular. But can we marry what we just learned about in C2 OEM, which has visibility up to layer three with what we have further up? I believe we can. So the ultimate idea would be once you launch the RPC, there is child spans, like you call the TCP stack, and then you traverse router one, two, three. You go into the destination, you have the host stack on the other side receiving the request, you do the MySQL query, and then you go back and the request comes back over router four, and then you populate the results. So you would have an understanding of every single step through the system. And now I can tell you whether I need to go and blame the host stack or whether I need to go and blame router three, because router three was queuing their request for quite some time. There is a problem with that. And the main problem with that is how do you correlate? So if the trace ID and the trace ID can be carried in the packet, and there is proposals for carrying the trace ID in the packet. But if your request gets encrypted like what you have with HTTPS, right? I don't really have that visibility at network layer anymore. So I can't just look at the traffic in order to start correlating. I need somebody to tell me. And I can't really tell somebody by opening up, or as a socket option when opening up the connection now, because I'm at layer five, right? So we would need some mechanism to go past that information through the system, and that's a topic of open discussions. Because you want something that is not a novel new API, that everybody that needs to go start using, because until we start use that, it's another 10 years. It needs to be something that is in there and that's transparently embedded. And I think that's an open question. That's an open thing, like people discuss how to get things on the wire in a more standardized way. What might be an escape path is what happens right now with all the service meshes or network service meshes. Because that gives you an insertion point into the entire story, right? So you have the application talking unencrypted to the local proxy, like Envoy, and then Envoy can go and put that information in. It has the visibility. It still sees the unencrypted data and it creates another connection and it could put that particular information in and insert that information at layer three further lower down so that you can have, well, in C2OEM, and the application layer tracing go in hand in hand so they can pass and correlate individual trace contexts. So the bigger picture would be that we're slotting in a network portion and then really have layer five to layer three, ultimate visibility as we progress through the system. Again, the observation here is if you look at the application level tracing domain, it's really that what was first there was a bunch of tools. They started off as open source and they started to be harmonized to become more of a standards effort. It was again the tool chain driving the standard, not the other way around. But at some point we want to have something that works as a stable base as a standard. So kind of from a scorecard perspective, open tracing does the job, open telemetry, sorry, open telemetry does the job as it succeeds as a merger of open sensors and open tracing. Again, one layer up and you can go and look at all that telemetry stuff and it's all kind of networking style. If I'm talking to my business leaders, they say, yeah, this is operational information, kind of important stuff but what I really worry about is how do people use my equipment? Do they have the right licensing in place? Do they have the right kind of overall business settings in place? So their view on telemetry is they're interested in a particular portion of what we would call operational telemetry that we would typically in an operational environment not use that much because it's mostly business reporting information. So that's something that we started calling business telemetry because then suddenly you have a topic, although you're an operations guy, with your VP. So asset information and specifically license information. Is the overall thing still under license or not? Is it outdated? Does it run the right license for the feature set that I'm using? So all that particular information, we start to stream upwards and certainly people that hadn't even called me prior are now interested in the information that I'm going to produce for them. Well, operational telemetry and business telemetry, for us, it's no difference. It's just another piece of information that I'm going to go stream out. For them, it's a different approach because they put that into a different semantic context. So in summary, telemetry is really working towards a system and we created that system where we're enabling that system by making all the data suddenly available. Available in a semantically concise and interpretable way. Business data, operational data, configuration data, they all go hand in hand. Application data, network data, business layer data, they all go hand in hand. And at the same time, what we also have to go and observe is that we shepherd tooling, code and industry coordination and standards that we shepherd them kind of in lockstep. So what I really would motivate people to go do is look to the other side of the fence. If you're building something out, don't really build things out as a silo because then we give rise again to yet another identifier with yet another slightly different semantic meaning but see whether we can use more tooling and more things in a standardized way than what we've done before because it's to the benefit of all of us because then we can really combine that application layer to the network layer and have ultimate visibility into the system even all the way up to the business layer. Thank you so much. So any quick questions that I can go and help with surface? We have another two or three minutes. All right. Enjoy your lunch.