 Hello, everyone. Welcome to Kupan. My name is Purvi. Hi. I work at Honeycomb. I've worked with browsers for a really large part of my career, and I'm really interested in getting front-end observability to a standard that's on par with the kind of observability we have for backend systems today. As I mentioned, I work at Honeycomb where we're trying to solve this problem, and I'm going to prove her on the OpenTelemetry.js project with special interest in web APIs, and I'm really excited to talk about OpenTelemetry and browser performance today. So I want to start by getting to us, getting us to a common understanding of what I mean by web performance, because it can mean a lot of different things to a lot of different people. As devs, I think sometimes we try to make things as fast as possible, and honestly, that's really fun. It is a really fun engineering problem to make things fast and performant, and we kind of end up convincing the execs by saying something like, improving our page load times will get us increased conversions, and, you know, execs like money. So it's a good argument, but it's honestly pretty hard to correlate that, or I guess, cause that directly with data. So when I'm talking about performance on the web today, I'm more interested in if the users of those websites think that that website is performant. It's a much more accurate indicator of how your website is going to be used and whether your website is easy to use based on what users think more than what your page load metrics might say. Because ultimately, and this is from the MDN docs, which are kind of the main browser source of documentation, web performance is really all about making websites fast, but that also includes making slow processes seem fast to users. And that's a really important distinction. That perceived performance is really, really important. So what are our web performance goals? Before we get into what we measure and how we measure things, I think it helps to think about what components make up a performant web page. And that definitely includes reducing our overall load time, but it also includes making the site usable as soon as possible. Are people actually able to carry out the interactions that they expect to after the page has loaded without getting frustrated? Is the site reliable and pleasant to use? Are things moving all over the page? Can I get a show of hands if you've ever used a website that has really annoyed you to no end? Yes. We're all users of the web, and when we're trying to carry out critical things like checking into a flight or accessing some government services, it's really important to have a reliable website. And that also part of web performance should be responding and reassuring users when they're taking action. So you might not always be able to make your backend processes fast enough to have like near instantaneous feedback, but you can respond and reassure users as they're taking actions. Because using the web is no longer optional, especially since the pandemic, more and more of our critical services are accessed online. Things like getting vaccine appointments or changing your address in other government services. It's our job as developers to create reliable web performance experiences for our users. So let's get into some common performance metrics that are interesting. So the number one thing that comes to mind is your page load. But the problem with measuring page load times is the way page load is measured can vary a lot. And it can vary depending on who your provider is for what you're measuring, whether that's Google Analytics or a real user monitoring tool. Every tool might do it under the hood a little bit differently. Some tools will define it as when the DOM content loaded event has fired. So this event's fired when the HTML document has been completely parsed and all deferred scripts have been downloaded and executed. But it doesn't account for things like images or async scripts and what happens after that initial page load. Also it's pretty common to have it when the onload event listener has fired. And we commonly hear that improving page load times improves conversions, but usually that's taken a little bit out of context. Even after the onload event has fired, the page itself might have been painted, but there's other things that make it usable. So it's still a frustrating experience as a user to see a page loaded and you're trying to tap a button, but it's not doing anything. So that results in a lot of rage clicking. And all these metrics don't really matter if users aren't happy. So nothing, like if you improve your page load speed but users are still frustrated, those metrics aren't that important. Then we have core web vitals. So core web vitals is initiative by Google to provide unified guidance for quality signals that are important to delivering great user experiences on the web. So they kind of had this initiative that developers shouldn't have to be web performance experts. So we're gonna come up with these common metrics that Google thinks kind of encapsulate good user experiences by talking about load times with largest contentful paint, interactivity with first input delay, and visual stability with cumulative layout shift. And we're gonna get into all of these, but just a quick note about first input delay. It's currently a metric that's going to be deprecated and replaced by the interaction to next paint coming in May 2024. So first input delay really only accounted for a user's first interaction after a page is loaded, but interaction to next paint is intended to kind of be a cumulative score of how interactions continue as a user is continuing to use a webpage. So I wanna get into each of these metrics because I do think it's important that how things are measured corresponds to the value of insight that you get from that measurement. So for largest contentful paint, the way that it's defined is the render time of the largest thing in a webpage viewport. The viewport's like your browser or mobile browser screen. And the largest thing in it. So that might be an image. So here's a couple of examples. So in the top examples, you can see this is a TechCrunch website. And under the hood, it's trying to figure out, okay, when do I fire my largest contentful paint event? And it's when I see in the viewport whatever the largest thing is. And that happens when the image on the right-hand side loads. But the more interesting example is this Instagram example. So for the Instagram login page, we see that the LCP actually fires after the logo has loaded, but that login button hasn't loaded. And I don't know about you, but I think that login button is pretty critical to considering the main content of that page to have been loaded. So I'm actually kind of getting a bit of a false metric there by just seeing that the logo has loaded. But ideally I would like to be able to define whatever I think the main content of the page is. And one thing that we start to see pop up even more and more as an issue with largest contentful paint is ever since we've had a mandate now, which is a really good thing, to have cookie banners on our websites. A lot of folks are actually seeing that the element that Google considers your largest contentful paint is not the thing you expect, but it's actually the cookie banner. So you're getting a lot of LCP data values that are false because they're measuring whenever your cookie banner is loaded. And often cookie banners are loaded by third-party websites, so you actually might have really, really skewed data. So I have a lot of beef with LCP because not only is this stuff that we talked about, but also the values that are defined as good, needs improvement, and poor are based on studies from the early 2000s, back when the web was solidly in its neopets era, if you know, you know. And all of these studies that are cited they were conducted on really small sample sizes of like 100 or 200 undergrads that all went to the same university. So that's really not relevant to the modern web today. We run a lot more JavaScript. Like React apps are so bloated these days, we're running so much more, and none of these tools existed in the early 2000s. And most importantly, the main content of a page is usually more than any single element. So moving on to cumulative layout shift. So again, show of hands. Who has used like a recipe website and they're trying to get to the recipe? And yes, there's a bunch of blog content, but then also the ads are all popping up and you're like, oh my gosh, it's Christmas and I just need to make this cranberry sauce. Like can I just get this done? And that's kind of what cumulative layout shift really is trying to measure. It's trying to measure essentially how annoying is this website? How much are things shifting and putting things out of view? It's actually really like this metric, but there's a little bit of a problem with it. For a really basic use case, it does a good job. So if you have some content that then gets shifted down, it'll do some math and we won't get into it because it's really not that important. But it does some math to figure out what percentage of the viewport combined with some other things to give you a layout shift score. For a more complicated example like this one, where several things might be shifting, so you can see that dog gets pushed down one, horse gets pushed down two and then zebra actually gets pushed down like four little things. cumulative layout shift, it used to do an amazing thing. It used to combine all of these shifts and report them as an aggregated value and they changed that. So now they actually only pick the largest shift that happened. This results in some pretty weird behavior because you might see a large layout shift score and fix that issue, but then you might still see something that's almost as large and have to keep fixing it. So it kind of feels like whack-a-mole. So what we're really talking about is the largest layout shift. And that word cumulative, which was so amazing and powerful, is actually not really describing what this is measuring. So it can kind of, when you're trying to fix a shift score, sometimes feels like playing whack-a-mole because you don't know what's going to pop up and you don't know how you can keep improving things because it's not really cumulative. So as I mentioned, first input delay is being sunset because it is possible to have a pretty good FID score even if the responsiveness. And when I talk about responsiveness, I'm not talking about sort of like view port size. I'm talking about you try to interact with a button and then nothing happens and then you get really, really angry. So it can really lead to misleading data about interactiveness because it only measures the first delay and you usually end up loading a lot of JavaScript that can block the main thread later as well. So interaction to next paint is exciting. I'm really excited about this metric because it is an aggregate metric that continues to measure how interactive your website is. And that's coming May 2024. It's available to use now in beta as well, but it is going to officially replace FID as a core web vital in May of next year. Okay, so that was a lot. I said a lot of acronyms. I said a lot of values. So I just like to take some time to just like reset our brains a little bit. This is a picture of my dog. So she's the donkey on the left and her two best friends. And they dressed up for Halloween. It was very cute. My partner and I like to joke about this picture that if they were the three pillars of observability, Misha would definitely be metrics because she's kind of weirdly shaped and odd. So yeah, I just want to take a minute, reset our brains and get into what these metrics end up leaving out. So often these are just, these are sort of like the only things that we end up measuring on the web. And we kind of try to gain these metrics as much as we can and determine whether or not our site is performant or not. But they leave out a lot. They really leave out how network requests are affecting page performance because websites don't exist by themselves. They're calling so many different APIs and services and it's important to have insight into those too and how that's affecting web performance. They leave out anything more specific about your particular use case. There's so many different types of apps and websites and we try to hold them all to a certain set of metrics and a certain set of standards, but they're all really, really different and we should treat them differently. And for me, most importantly, as a developer, I get frustrated because this really leaves out any clues about what to do next. It's hard to know what to do next because all you can see is that your metric needs improvement or even more frustratingly that it's in the poor category, but there's no clues for how to improve it or what to do next. So that kind of brings us to the question, what should we measure? And spoiler alert, it depends. It depends a lot and we'll get into it because context is everything. The context of your website or application is really important to determine what your web performance goals are. And I kind of think of it in two categories. One is, what does performance mean for you? So the first question to ask is, do we care about search rankings? And the answer might be yes. If you're an e-commerce site or a marketing website and there's so many more use cases where you do care about search rankings and performance in that way, you do care about those absolute values that Google has set and those might be your baselines. But for a lot of us, if that isn't you, consider setting different performance baselines. If you don't care about page rankings, those absolute measures probably can be a little bit different. There's research that shows that users can be bucketed into familiar users and unfamiliar users of your app. So let's say if you're building an app that people use for work, they might be more familiar with it because they come to it every single day and they will probably be a little bit more tolerant of load times. And when you're building web applications, it's sometimes not realistic to hold yourself to a sub-2-second page load. But you can give users a lot of support and feedback to be able to wait for the things and give them responsive feedback and be like, hey, we see you. You're waiting for things and the things are coming. So for the case of familiar users, they might be a lot more patient than you think. Unfamiliar users, on the other hand, might be a lot more impatient than you think. So your web performance standards might need to be even higher. Is your app heavily used by mobile web users? This is super important context to have because performance on desktops and performance on mobile web browsers are two completely different experiences and they should be treated as such. Mobile phones still don't have the browser processing speeds that desktops do. So it's important when you do have your web performance data to be able to compare apples to apples and separate out your data from desktop to mobile browsers. And then what are the critical actions that users take in your app or website? Are those things performant? And it's more important than kind of having absolute page load times across your site than figuring out are users able to carry out this critical thing that I really care about? And there's dozens of other questions. These are just some that came to mind to ask yourself. But the point I'm trying to make here is that it's important to ask yourself these questions to figure out what performance means to you. The second piece of context is what information do I need to add to my telemetry to make sure I can debug it further? Seeing raw values can be really, really unnerving. And I want to kind of go from that no idea cliffery to being able to debug things like this person who I found out is not Julia Roberts while I was doing this talk. Yeah, big surprise. So things like your user ID and session ID to be able to really dig in to see, okay. So I have this particular user who's been talking about performance issues for so long but I can't actually see any data from their sessions. Element target information. Which element actually caused my cumulative layout shift score? URLs, what pages are my... If I can segment out and see, is my checkout page performance? Maybe I don't care so much about some other pages but I really, really want my checkout page to be performant. Browser metadata, device data, browser data, whether your site is being accessed on mobile or desktop. And lots more. You know your system best and you should be able to add whatever information you want to do the type of intersectional analysis that gets you to be able to go from I have a problem to I can debug my problem. So I just want to quickly talk about field data versus lab data. So there's kind of two ways to measure things on the web and one is synthetically. So that's known as lab data. So that's your running some sort of synthetic system. You can run Google Lighthouse against your website and it's like a bot goes and hits your websites and reports back a Google Lighthouse score. You can run that in CI. There's even automated synthetics tools. And it's still a really useful thing to be able to have synthetic data but they're usually single data points that are generated by bots and they can be misleading because they might not be where your users are and they might not be using the devices that your users are using. So sometimes you'll get like a skewed idea of your performance if you're only using synthetics which is really important to use field data or real user data. Part of those core web vitals is not just seeing that one of your users had a page load time under two seconds. You actually what the core web vitals quote is that you want to have a sub two second load time for 75% of your users. That is actually... So to be able to get statistical significance you need to be measuring with real users because this actually ends up representing what actual users are experiencing. And then you can drill down and look at different percentiles as well. What is the slowest that some of my users are experiencing? What's in common about them? Are they at a particular location or accessing it with particular devices? So this really goes into being able to do that intersectional analysis and making your website faster for real users. So one way to measure data with real users is by using open telemetry. Open telemetry is an open standard for collecting and sending data. And they have a browser SDK. This is an open standard. It's really flexible. You can send it to your vendor of choice like Honeycomb. There's also an open source place you can send your data to called Yeager. And all you need to get started with open telemetry is a browser or website, the open telemetry APIs. And like I said, a backend that accepts open telemetry data. Because I don't think vendors should be able to determine what you can measure. You should be in charge of the telemetry that your system is sending. And if you want to make changes to that, you should have the power to do it rather than having to make requests to vendors. And because it's open source, you can actually contribute the instrumentation that you think is missing to measure your system better or add the data in that you want. So getting started with open telemetry, I really don't have a lot to say about these next two slides. You install some packages. You write the setup code. Don't feel like you have to take pictures of these. I have references at the end. And with an example GitHub project of this setup. But yeah, it is as simple as you install some packages. You add a little bit of boilerplate setup code. And you're ready to start sending data with open telemetry. This is kind of what adding context looks like in open telemetry. So those things that we talked about of adding the user region, adding the browser language that the user is using, or whatever else you can add, your user ID, session ID, things like that. So that would be added on your resource attributes that get sent with every single span to open telemetry to allow you to do better intersectional analysis. There's some auto instrumentation that comes along with the setup code. And one of them is document load instrumentation. And I know what you're thinking. I spent a long time talking about how page load instrumentation or page load information isn't that reliable. But for me, the most exciting part about the document load instrumentation is actually the insight that it gives you into the resources that are being loaded onto your page. So this now goes from, okay, my page load is a little bit iffy, but I can actually start debugging it with figure it'll show you like the order that your resources were loaded in and how long each of those resources took to load. So here you can see this is a trace from our documentation website. And there's a gif here in our bubble up documentation that takes six seconds to load. So it's probably a good idea for us to go and maybe compress that down or use a different asset for that. And you can really see that at a glance. So it's really exciting to be able to immediately go and start debugging what's going on. The other piece of auto instrumentation, and when I say auto instrumentation, I mean, along with your setup code for open telemetry, it'll just start sending these fans. You can set up network auto instrumentation. So this will pick up and sends fans for each of the network requests the browser is making. And you can set it up so that it'll propagate what's called a trace parent ID. And if you're already instrumenting your back end with open telemetry, it'll connect those traces kind of like you see in this trace here where this HTTP get trace was actually made by the browser and it connects to your back end traces. So front end teams often encounter this thing where they're actually the first folks to have bugs kind of come up for them. It'll be like, hey, your website is broken. There's this bug. And the first thing they want to figure out is it our front end or is it something going on on the back end? And kind of jokingly call that proving your mean time to innocence. So something like this will prove your mean time to innocence pretty fast. You can kind of see at a glance, is it your back end that's taking a long time or do you actually have a front end performance issue going on here? There's also some early Web Vitals instrumentation that's available. It's linked in along with that GitHub repo I was talking about. And it's based on the Web Vitals NPM package attribution build. So that attribution build is super, super important because not only does it report your Web Vitals scores, it actually tells you which elements it attributed those values to. So what I mean by that is here, I have an example of like some cumulative layout shift scores, but it'll actually tell you which target in the DOM contributed to that CLS score. So now you actually have a place to start to go and start debugging, which is pretty powerful. It can tell you things like which element it considered to be the element that it based the LCP score of. So you can do things like discard, or discard any events that you feel if they use the cookie banner as the LCP. So you could actually just totally ignore those and get much better LCP scores by having this attribution to what caused it. So I think this is a key piece because often Web Vitals are really, really hard to debug. And being able to have a jumping off point makes it a lot easier. But in my opinion, even more exciting, and this is now getting into some manual instrumentation, is the Element Timing API. So this is experimental currently and only available in Chromium browsers, but I think it's still pretty useful to try and try it out on your website. So basically it's a new API that allows you to determine what you think the main content of your page is. So for example, you can just put this Element Timing attribute on your HTML, give it a name, and in JavaScript you can observe and check in on how long that took. So this means you go from something like an LCP value, which Google has chosen whatever element it thinks, which is the biggest element, which is a little bit, it can be a little bit problematic at times, to you determining what you think. So you can go from LCP to time to first tweet if you're building Twitter or X or whatever it is today. But it just means that you can get a much more meaningful metric. And then there is just more manual instrumentation. So you can wrap your JavaScript functions in spans and send them off with open telemetry. So if you find that there's, maybe you looked at some long task stuff in the browser and you identified some places that are creating long tasks, you can then wrap those functions in open telemetry spans, send them off, and be able to figure out which parts of your functions are taking longer than others and break them up. And there's lots more. There's long task auto instrumentation. So that'll actually do some auto instrumentation on what's blocking the main thread for your JavaScript. There's user interaction auto instrumentation, and it's really useful to link that together. So that'll be like clicks, scrolls, basically anything you can think of as a web event, you can track it with user interaction auto instrumentation. There's a new long animation frame API, which is more exciting than long task auto instrumentation because it actually takes into account animation frames over just what's happening and being executed in your JavaScript. This is super experimental right now, but I'm excited to play around with it, and also whatever else you can think of. Open telemetry for web SDKs is still pretty much, it's a pretty new project, and we're really trying to make it as rich of an experience as possible. So if you're interested in getting involved, the Open Telemetry client side special interest group would love to have you there. So to recap, we talked about how things are measured is really important to the insight that you get from it because LCP values, for example, might not be that interesting to you if they're measuring a cookie banner rather than the main content of your website. So it's important when you're using these automatic measures to understand what is actually being measured and whether that's useful. Your context is the most important thing when it comes to deciding what to measure. Measuring with real users is important so that we can do that intersectional analysis and figure out what issues are going on for what segments of your users, and adding detail to telemetry is super, super important. That's all I got. Thank you so much. I think we have some time for questions. Yeah. If you have questions, please come to the mic. You mentioned about adding additional tracing to be able to correlate front-end events with back-end systems. Can you elaborate a little bit more about your experience? That's probably the biggest reason I'm at this conference is to learn how to do that. Oh, sure. Yeah. So if you're doing it, there's a couple ways to do it. You can do it, like I said, using the Open Telemetry Network Auto Instrumentation. There's two packages. One is if your front-end sends data over XML or if your front-end sends data with fetch APIs, so use whichever one your front-end is using. And then there's actually an option within that to tell it which... You can give it a regex to match your back-end API URLs, and it'll do some context propagation. So it'll actually just send a header. And if you've already instrumented your back-end systems with Open Telemetry as well, it'll connect that trace parent header to your back-end telemetry. Yeah, absolutely. Hey, how's it going? Great talk. I had two questions, actually. So I'm gonna pick this up because it's a little... So the first question is around sampling, right? Because in situations where you have millions of users and tons of apps, you likely don't want to have tracing on by default trying to calculate everything, right? Yeah. So what are your thoughts on potentially sampling at the browser and how to do it in a smarter manner as opposed to random sampling? And the second question was essentially around updates to the client-rum SIG for Open Telemetry because I don't know if it's gotten a lot of traction, but there seems to be a couple different projects in this area like Grafana Faro and a couple... I think Boomerang's one, but there isn't a lot of standardization around sort of that data yet, the schemas. Yeah, so I'll start with the first question about sampling, and that's a great question. I feel like because all of this is still in a bit of an experimental stage, there hasn't been a lot of thought behind that. So right now the sampling available through Hotel is you can give it sample rates, you can do tail sampling through the collector. At Honeycomb we have a thing called refinery where you can actually add lots of really... you can add dynamic sampling rules or do rule-based sampling as well. In terms of sampling on the front end, it's a little bit different because you're probably going to get so many more events and so there's kind of two things there. There's your network requests that get connected to your back-end traces. So you probably want to sample those... you want to be able to connect those and sample them together because otherwise you'll have a bunch of disconnected traces. In terms of other things that may not connect to your back-end, some sort of rule-based sampling would probably be a good way to go. And the second question was about the client-side special interest group with open telemetry. So we do have a client-side special interest group. We meet every Tuesday. In my references link, there's a link to all of the special interest groups and when they meet. So right now one of the things that we're doing as a SIG is trying to align on browser specifications and semantic conventions. So that's something that's an ongoing effort right now to define those things. If you care about that and you want to have a say in that, come and get involved. Yeah, thanks again. Great talk. Yeah, I just kind of wondered, as I've been thinking about this, where you see the intersection is between ROM where we're trying to aggregate things a little bit and then tools like Adobe Analytics where I'm trying to really capture the user's journey throughout the site. And you could model the ladder maybe as traces and spans would be one ridiculously long trace for however long that user is engaged on the site. But it does seem like there's kind of a middle ground here where as a developer, maybe you're doing a Twitch video streaming session and I want to know, you had a problem and I want to look up your session ID which maybe I carry in baggage to the back end and I want to see everything you did. But then it feels like maybe we're overloading ROM to be a user journey capture tool but just kind of wonder what your thoughts are and where you draw that line. That's a great question and I think that really just depends on vendors and what products they want to build and who those products are built for and what problems they should solve. So ROM is really interesting because it aggregates a lot of data for you and it kind of gets you to the point where you know where things are and where problems might be but they're still kind of figuring out in those tools like how can you explore and debug further. So I think in terms of like a data standard open telemetry is unopinionated and really flexible and it can probably service sort of like all of the tools that you mentioned. But I think going from real user monitoring to front-end observability is a leap that we still have to make in terms of like how you can go from sort of more metrics oriented data to actually surfacing unknown. So my answer is like I don't really know if I draw that line and it would be great for front-end devs to be able to explore their data and be able to work closer to how back-end devs get to work today with observability. I think we have time for one more. Or no, okay, we got to wrap it up. Sorry, I'll chat with you afterwards. Okay, thank you so much.