 So hi, thanks for coming and sticking with it all day till 5 PM. Oh my gosh, you're amazing. Thanks for being here. My name is Jason. Went too far. Whoa, I'm Jason. I'm the icon on the right there. I use he him pronouns. I'm an engineer at Splunk. And I help out with open telemetry Java and the instrumentation thereof. And I'm a co-maintainer with Cesar on the Android project. Hey, good afternoon. I'm Cesar Muñoz. And I'm an engineer at Elastic. And I'm also working as a co-maintainer of Open Telemetry Android. Nice to meet you. All right, so we'll do a brief overview of what we're going to be talking about and showing you today. So in the client space for observability problems, they're quite different in certain ways from what you might experience on the server side. So we're going to describe specifically challenging environment that we have to contend with in client instrumentation. And then we will walk through and show how we've implemented a solution to that challenge. We'll do a live demo, as promised, and cross your fingers on that. And then we'll sort of wrap it up. Awesome. So let's start with a use case example. So here right there, we can see a Bob. So he really likes hiking. And right now, in that slide, it's getting prepared to start to do so. You can see Bob's really happy about it. Let's see what happens on the next slide. So here right on the middle of his trip, he doesn't have internet connectivity. But he's still happy. I mean, he's hiking. He doesn't need it right now. But we're going to see later why this might be a problem. By the end of the trip, Bob has done all of the hiking and he's happy about it. And he's ready to go home. He already has internet connectivity. And it was a great day for Bob. Now, there was a problem during Bob's trip, which is that right in the middle of it, when he didn't have internet connectivity, there was no way for the application that Bob was using at the moment to send telemetry to the backend service. So there's probably important data there that was lost because of that. And this is actually one of the pretty common challenges that any mobile application has to deal with, which is why when we started working on open telemetry for Android applications, this is one of the first things we had to take care of. So just to try to illustrate a little bit better what we've been working on regarding this buffering, this is what roughly a normal use case scenario will look like. We have a device that sends telemetry data over the internet to a turbability backend service. This is pretty much the normal stuff. So what we've been working on so far is adding this extra layer, if you will, where the Android applications will just store all of the telemetry data first into the device's internal storage. The way we do so is that when we are setting up open telemetry in the Hotel Android project, so we have to set this exporter for each of the signals sent by the agent. So we call it the two-disk exporters, and that's what we configured at the beginning. So the tracers and the meters and the logger, they all have these two-disk exporter, which will be the first exporter that the application will send data to. And it roughly looks like, well, before that, we see the exporters, each type of signal has its own one. And then we also see that each exporter stores the data in a specific folder that is just for that signal. The exporters, they look like this right now in code. This is a bit of a specific kind of details of the exporters that we're setting up in Hotel Android. However, if you're using open telemetry Android, you don't have to deal with them. It's something that we do for you as part of their own initialization. So if you are using open telemetry Android, this is the way that you will have to enable this buffering, because right now it's disabled by default. So you have to create the disbuffering config, enabling it, and then you can set the maximum of the size that you want your telemetry data to take as part of the device's disk space. And then you pass that configuration into the AutoROM config object, and then you're done. Now, let's go through a bit of more of smaller details that happen in this process. So right now, the data that is stored in the device looks like this. The open telemetry Android project, it takes care of creating this directory, open telemetry signals. And then for each metric, there's going to be a new folder created just to add files for that specific signal. And we can see here the metrics, logs, and spans. And then we can see how the files look like, at least in the case of the span directory here. Now, let's move on. Given a bit more of details, so the data right now is serialized, is using PradaBuff. Well, the reasons for that is it's because of performance, and also because it doesn't take too much space in the device. And also, as you may have noticed in the previous slide, the names of the files are just a timestamp. This is because right now, we're taking care of honoring the first A and first out type of queuing so that we make sure that the oldest data is the one that gets exported first. Now, a bit more of details. So if the device is full, because we also have to take care of what happens when there's not a happy path. If the device is full, or the right fail for whatever reason, we attempt to export it immediately. This is not configurable right now. But again, this is a fairly new tool that we've been working on, so we're open for any kind of feedback. Now, so far, we've covered what happens when storing the data in disk. Now, there's the second step, which is actually pulling that data from the disk and actually exporting it. And this is the step two. Essentially, we create another set of exporters. Now, this time, they are called the from disk exporters. And the same case as with the two disk exporters, there's one for each type of signal. And essentially, what they take care of is just pulling the data from the device, parsing it, and then sending it to the actual exporter that will send the data to your back end service. Now, when it comes to reading the data, at the moment, we're reading it periodically. If there's internet connectivity available, we send that data. And right now, the periodic work that we're doing is every 10 seconds, give or take. Once the data is successfully exported, it's removed. And so this means that if there's any kind of issue and the data cannot be exported, then nothing will happen. You will still keep your data there until it is successfully exported or until it becomes stale. So right now, we have a logic of removing any very, very old data that just didn't make it on time, just so that we make sure that we keep the latest telemetry data, we keep sending it. So right now, I don't remember exactly the timing, but the configuration is, I think, 13 hours or something, maybe more. Anyway, so finally, when can we do the second step, which is reading the data and exporting it? The thing about this is that it's not as simple as just whenever you have internet connectivity. In Android applications, there's always the challenge of, there's always different challenges, actually. And one of them is internet connectivity. The other one is that your application is actually running and is not killed by DOS. You also have to be a good citizen within the device where your code is running. So you have to be conscious about battery life. And you also have to be conscious about not consuming too much mobile data, because that could incur some fees for the user. So it's not a very simple question to answer. And right now, it's still a work in progress, really. What we do is just try to export it every 10 seconds or so while the application is running. So any kind of feedback, again, is welcome. And with that, I'll be with Jason. OK, as promised, we're going to attempt to do a demo here. So let's see if I can get this set up. All right, that's pretty good. All right, let's zoom in on this a little bit. Maybe that's too far. All right, so we have a sample application that we hacked up for the purposes of this talk. And it's called Backpacking Buddy. And I don't think there's a link in our deck yet, but I'm going to add one, and then we'll re-upload it, and then you'll have access to it. So what I'm going to do before I fire up this app is I'm running Yeager locally. Well, somewhere I'm running Yeager. All right, we'll be running Yeager if I can find my terminal. It's going to happen. There it is. OK. And you can't see that. Now you can see that. OK, so I've just started up Yeager in a Docker container running locally, so we have some place to send the data. Normally this would be an observability vendor and their specific backend. So back over to Android Studio. And now I can launch this app. So this is, like I said, a contrived app that we've hacked up, and it tries to demonstrate this scenario that Caesar described earlier. So imagine this is something you would have on your device as you're hiking through the mountains, and it's tracking a couple of things, the distance that you've gone and maybe the elevation that you're at. And those two things change over time. And then you've also got some faces here that allow you to sort of click a disposition, depending on how you're feeling. Because when you're hiking a long ways, you're not always happy about it. In any case, here's our simple sample app. And if we look here in this spans directory, following this tree structure that Caesar described earlier, I'm going to do a synchronize. And we see that there are some files that are loading up in this spans directory. And if we give it some time, because Yeager is running, hopefully these are getting periodically written to disk and then periodically read off of disk and ingested. So if we do just a refresh here again, we see these numbers change. Actually, one of the files went away entirely. And then hopefully we can go over to the Yeager UI. There we go. Zoom back out so we can see that. We will just do a little quick query for some traces. We'll see what's in here. And we see some data. So this is the real data coming off of that mobile device through the Yeager interface. And so now what we're going to do is we're going to go back over to the device. I guess that's almost readable. And I'm going to go into airplane mode. So this will effectively kill the device's ability to use any network. So this is just kind of a quick and dirty way of simulating a network outage or a loss of network. And if I stall a little bit by talking more with my human words and then I click on synchronize, we will hopefully see some additional files begin backing up in here. So maybe I'll give a couple of dispositions. Like maybe I was neutral. And then I went happy again after a few more kilometers as it goes. So we're getting a backlog of span files building up in this directory. All right. So let's be daring and open one of these. And my IDE is saying, well, what do you want to do with that? And I'm going to say, yeah, sure, text to why not. And we can see that this is just the raw protobuf data. Like there's nothing like super. You might be able to read that. But yeah, it's not going to render at all because we're taking these data objects directly and just bringing them down onto disk as proto, which is great because it means that when we read them back, we can just put them on the wire. Already an OTLP, the open telemetry protocol. All right, so that's enough stalling. Let's see what we have here. Some more data is building up. Now, like we described, it's about a 10-second interval. So I'm going to come back out of airplane mode. And all the while, we've been generating more telemetry around our distance that we've hiked and the elevation that we've reached. And I will just click a few more to generate some more disposition events. And so we got a little bit more. And eventually, our sweeper, our reader, what's that class called, Caesar? The periodic, what's it called? Periodic and process. The periodic telemetry reader, I forget the name of it. It's going to come along and find these files. And look, it shrunk down. So now we can go over to Yeager again. And we can refresh our UI. And maybe I'll just boost this up to a few more events. And we can see that we had data coming in this whole time without gaps, right? So even though we had a gap in the time during which our application was able to actually send telemetry, there's no gap in this data process. And then just to sort of prove out with a little bit more detail, in our operations here, we can see all kinds of things that happen. Like, this network changes an event that came from the sort of built-in Android instrumentation. But the things that our contrived application was doing, the things that I was clicking on there generating these disposition events, guys as spans. And if we just dig in, we can see that, like, you know, little happy face, or I can't actually see that. Maybe it's a sad face. But it's a neutral face. OK, so that's just an example of one piece of telemetry coming out. But the main point of interest there is that we have a continuum of all the data without loss. All right, so this is a really young feature. The core implementation of this lives in the Java contrib repo. It's not part of Android. So conceivably, this could be used in other environments. I don't know that we have a terribly great use case for that yet outside of client side or Android mobile support. But conceivably, there could be some other use cases. If you know of one, come talk to us. Other features, because this is young, that we're thinking about adding are things like adaptive cache support. So rather than just saying, like, use up to 10 megs, it might be use up to 10 megs. Well, OK, now we're short on disk space. What do we do? And maybe that can be an adaptive threshold. Another concern is, potentially after generating lots of data on disk, you get back on network. How do you prevent the device from just absolutely using all available bandwidth to send that telemetry? So doing a little bit of throttling on the way out, the sort of thundering herd effect, reducing that. And then, yeah, we have done zero performance optimization. We've gotten nearly no feedback yet. So we're looking for people to try it out. So your call to action is to go and try and use this thing. Like I said, I will post a link to the repo. And we will go from there. Yeah, if you have any questions on how to set up the project, how to set up the sample app, we will both be in the Observability Observatory, the Hotel Observatory. On Thursday, we have a session on the calendar. So please, and just combine and talk to us anytime. And thank you.