 Hey, everyone. Welcome to Ask Me Anything, Web Vitals Edition. I'm your host, Phillip Bolton. And I'm here with Elizabeth Sweeney, John Mueller, and Annie Sullivan to help me answer some of your questions. As a reminder, if you're watching this live and you're a registered attendee, don't forget that you can ask questions live by tapping on the Q&A icon to the right in the Livestream video on the IO website. OK, let's get started. So it looks like the first question we have here is, how does Google determine a page's Core Web Vitals scores? This is a very important question. Annie, do you want to maybe want to take this one? Yeah, so we measure the three Core Web Vitals in Chrome, the first input delay, the cumulative layout shift, and the largest contentful paint for every page load. And then we take the 75th percentile of each of those separately. So if your 75th percentile of all your page loads meets largest contentful paints, 2,500 millisecond threshold, then your page meets largest contentful paint. If it meets the 0.1 CLS threshold separately at the 75th percentile, it also meets. And then either it meets the fit threshold, or since not every page has input, it doesn't have enough samples. Then it meets the first input delay. But these are real user data from Chromium. Does anyone want to add anything? Just, I guess, that last point that you just said, the last line is, I think, really important. This is real user data coming from Chrome users who have opted into sharing user statistics. It's not Googlebot or anything like that. I think that should go ahead with it. The fact that it's like 75th across the board, too. So it's just three out of four are going to be having a good experience. You're trying to meet that bar. I kind of like thinking about it that way, too. I find that helpful. Yeah, I think this is a great transition to the next question, which is, why am I seeing different scores reported in different tools, such as Lighthouse and the Chrome user experience report? Yeah, so. You go ahead. Yeah, I can take that one. And I'd obviously love to hear other folks' thoughts, too. But I could have an entire AMA just on this. And that is because there's a lot to tease apart here. I'll take a stab at a few of the key points. One of the first ones is that we have two fundamentally different sources of data here that we're dealing with. So we have our field data, which is, just as Annie and Phil just mentioned, that is used for Core Web Vitals. This is, OK, what are your real users experiencing? This is people who are engaging with your content, and you are getting signals back from them as to how good their experience is. But you need a way to debug and diagnose that before your user is engaging with that content. And so you want to have some more control and some more granular data there. And that's where simulated data comes in, also called live data. So that's the first key point, is that there are two different data sources. And so you're going to see different values, because one is representing all of your users, and then the other is representing a simulated load. The second point I'll make, and then I'd love to hear if I've missed anything from folks, is that there are also different runtime conditions depending on the tool that you're looking at. So for instance, if you're accessing Lighthouse in the DevTools panel, then you are going to be operating locally on your own machine. It's going to represent conditions that are local to you. Whereas if you're using PSI, you are pinging servers and getting that response back. So there are going to be deltas there as well. Yeah, and I think, to summarize some of that, or restate some of the important points, Lighthouse is a lab-based tool, meaning a real user is not interacting with it. It's running a simulated environment, whereas the Chrome User Experience Report, which is where the core BIOS data that you'll see in tools like Search Console or PHP Insights is coming from what we call field data. Sometimes it's called ROM data. This is coming from real users that are actually going to those pages and interacting with them. And so the difference is often, there are different tools in the lab setting, as Ludwig said, but the difference between field data and lab data is really important to understand. And if you go to, what about DevSuch vitals, there's lots of content there. So I think people can go there if they have more questions. So it looks like the next question is, what are web vitals? I think maybe another way to phrase this question is, what is the difference between web vitals and core web vitals? I think I'll take a stab at answering this. So web vitals is the name of the initiative or the program that we have here in Chrome that, you know, it covers, encompasses everything, the whole web vitals program. Web vitals is also a term that we use to describe the individual metrics that are part of the web vitals program. The core web vitals specifically are the web vitals, the subset of web vitals that we feel like are the most important to measure. And they have to meet certain criteria to be core of vitals. So they have to be measurable in the field by real users and they have to be representative of the user experience and they have to generally apply to all web pages. And so if you are looking for just like the minimal amount of things to focus on, the core of vitals is a great place to start. And then we have other web vitals that are often, that are good performance metrics to care about and they are useful in debugging and they're helpful to debug the core web vitals. So for example, time to first bite is a web vital and time to first bite is often useful in debugging your largest contentful paint. It helps you know whether maybe your server is slow or your kind of browser code, like front end code is slow. And so this is kind of how I think about the differences. Does anybody else have anything they wanna add? I'm not seeing anyone jump in. Great, okay, let's go to the next question. Is inclusion in crux based purely on having enough data collected for a URL for a statistically relevant sample size or is there other capping to X amount of URLs or origins? If sample size is the primary concern, is there a good will of thumb to be aware of when collecting your own RUM data? So I'm not 100% sure exactly what the question is asking. To the last part of the question, is there a good will of thumb? I would say if you can don't, you don't necessarily wanna sample your RUM data, you wanna get as much of it as you can to get the most representative, you know, sample size, which is the full sample size. If you have to sample for some reason, you know, you can certainly do that, but we would recommend not sampling. For the question about the crux threshold to crux based, I don't know, Elizabeth, do you wanna maybe comment on that? Yeah, I mean, at a high level, you know, we just wanna make sure that whatever we're actually sharing has reached a certain threshold for anonymization properly. So, you know, that's kind of how we determine, you know, what, you know, where that threshold is in terms of what we're actually kind of publishing in the crux data set, but that's very high level. I don't have much beyond that. We don't do any capping though. Like if you have more data than the minimum sample size, there's just more data being used to calculate the crux scores. Okay, let's go to the next one. This question says, AMP prides itself on resolving all through the core web vitals, but it does, dot, dot, dot, when clicked from SERP and pass through the AMP cache. When the source is tested with PSI, it gets a score that leaves much to be desired. Does the tracking signal take this disparity into account? I think the short answer is yes, it does. I don't know if anybody wants to expand on that. Yeah, we are really serious about, again, always using real user data and what real users are experiencing. So something goes through the AMP cache. We're measuring that. If it's going through AMP origin, we're just, we're measuring what the user sees, no matter like which way the page may or may not be using AMP. Right, so if more people are visiting a page from the search and the results page, then maybe, and those pages load fast because they're coming from the cache, then the scores could be better. If more people happen to be going directly to the origin, the pages might be lower, but it just depends on what the real user is seeing. And I guess it also depends on what they do after they land on that initial page, right? Yeah, absolutely. If they're navigating to other pages on the site, those aren't necessarily coming from the AMP cache. Okay. So the next question reads, Crux is useful for a publicly accessible site. Do you have any idea on how we can get similar data for a private accessible site? Private here means, let's say the web app is only accessible through a web container in a native app. Crux is, this is true that Crux is accessible only to a publicly accessible site, but you can use your own real user monitoring data for any site. And this is what we recommend that everyone does, whether you have a public site or a private site. I don't know if we have much more to say about this. We always recommend using rum data. Rum data is much easier to analyze and debug. Crux data is great to kind of understand how does Google see the performance and maybe compare it to your rum data, but using Google data for debugging purposes and other things is not what we recommend just because it's usually too high level. Anyone else wanna add anything to that? I should add the Philip wrote an awesome webvitals.js library and also gave a, I think I talked yesterday about how to send that data to Google analytics. So there are like a lot, there's a lot of documentation out there about how to collect your own rum data. Thanks for the shout out, Andy. Okay, the next question is, will desktop core vitals data be part of the initial page experience update or added in later? So John, do you wanna answer this? I think we touched upon this yesterday in one of the sessions as well, where we also mentioned that we're going to start taking a look at the desktop data as well, but that will happen at some later point. So not with this initial launch that is happening this summer. Yeah, but you shouldn't ignore desktop just because it's not part of the initial role that would be my personal opinion. Okay, so the next question is, the field data in lighthouse slash page feed insights being 28 day aggregation makes it impossible to take action on the data and be able to be sure the action truly was resolved. Are there any plans to either shorten the window or perhaps plot the data on a graph so we can see changes over time? So this gets back to the original, the earlier point that I made is that you should really be using your own rum analytics solution to monitor core vitals. You should not necessarily be depending upon these tools to debug your performance. So for example, let's say you see you have a problem in page feed insights, the scores are worse than you think they should be. You know, you wanna be going into your code and making the changes. You wanna be then deploying those changes to production. And at that point, you definitely do not wanna be going into page feed insights to check to see if, you know, your performance improved because as the question states, you'll have to wait kind of 28 days before the data fully catches up. That's the time to be going into your rum analytics provider to get your real time data or, you know, the current days data or the previous days data to understand if the data improved. Also, hot tip, I definitely recommend if you're using a rum analytics provider to send as like a custom dimension or a custom parameter or whatever your analytics solution calls your pages version. So when you deploy a new version, you can easily compare, did this version have a performance improvement? If it's all kind of mixed together in a single analytics bucket, the data can be a little bit noisy there. And you can see a graph of the data over time in Search Console. So it's like, it's still the 28 days delay there, but you also see what it was like before. Yeah, and I'd also love to add that this is one of the things that where synthetic data can also just help. Definitely you have to validate with your field data and you do have to wait for it. You know, so not detracting from that point at all, that's critical to note that the true validation happens there. But when you're iterating and you're needing to make sure, okay, hey, did the change I just make, is this going to cause a regression? This is where just running your lab test, making sure you didn't break anything, making sure it didn't affect the metrics too much can be really, really impactful. And yeah, seeing these things over time is something that I know a lot of the tooling teams are kind of looking at as far as, okay, we kind of give you a lot of information when you're still, we give you a speedometer when you're in the driveway. So when you're iterating and all of this, we're like, okay, here's how you're doing. And then as soon as you're in production and you're on the road, we take the speedometer away and we're like, good luck. So figuring out how we can make sure that you have really robust monitoring solutions. As John mentioned, search console is a great place to go for that time tracking, but we are thinking about other ways to help you out there. Right. Lab data is great when you are making your changes locally and you're monitoring, or you send it to a staging environment before you deploy to make sure that the performance is, as you expect, you hope that this is kind of predictive of the performance that you want to see. So that's kind of step one. And then step two is after you deploy to production, you look at your field data, your room data to confirm that it actually is that your users are experiencing the way that your lab tools predict that they would. Okay, so the next question is, when did the five second cap for CLS go into production? Should site owners expect a shift in CLS data around that date? So I believe that the answer is that it's not in production yet, but it will be prior to the Page Experience launch later this summer and fall. Is that correct, John? I don't know. I think there will, my understanding is that the data is currently being shown in Search Console is the older definition of CLS. And at some point in the next couple of weeks or maybe a month, it will switch over to the new version and there will be an annotation in Search Console, I believe. Hope I'm not giving false information here. Yeah, I think that was a plan, yeah. But it will certainly be for anyone wondering the any ranking effect that happens will be based on the new definition, not the older definition. And in the next couple of weeks, we're looking to roll out all of the tools with the newer definition as well, Lighthouse, Crocs, HVDN sites. If you're using the Web Vitals JavaScript Library, there is a beta version already released with the new definition added to that, beta one, I believe. Okay, the next question, is the Page Experience ranking boost per country? For example, LCP at 75th percentile is two seconds from visitors within the UK but four seconds from Australia, making my global P75 2.6, so average will searches from the UK get the ranking boost because of good LCP. So I think this is a interesting example. This might be real world example but I would also say that if you're using a content delivery network, you can often mitigate some of these performance concerns like it shouldn't necessarily be the case that certain countries have way worse performance from that perspective, if people's internet connections are tend to be slower in a certain country, then that can affect things certainly. But when I look at the data, and if you go to Web Almanac, which is a great resource, internet connection speeds across the globe are getting faster and faster. And I think that to some degree, it's a misconception that there are fast countries and slow countries, at least based on super recent data. But I think Annie, correct me if I'm wrong, but my understanding is that no, like it's all of the, all Chrome users together creates the score. Yeah, it's all Chrome users together. This is a part of the reason why we're using the 75th percentile. So in that example, more than a quarter of your customers are in Australia and they're getting slower times. And so that we think that, that should be reflected in the score that ideally you'd be using a CDN. And again, what the folks said is that, that is what we see when we look at like, navigations in various countries that we're not seeing slower navigations overall in certain countries. Yeah, I think a general meta point is, I think a lot of times people might imagine these hypotheticals and get concerned that some situation is gonna affect my core vital scores in some weird way. I would always encourage you to just measure it and look and see what the data is actually saying. A lot of times we see that these concerns don't actually manifest themselves in reality. Okay, the next question. In the context of the page experience update, does core vitals use only data for the specific page or combination of page and origin level data? Oh, how much time do you have? Yeah, this is a great question. And this is, I think a source of confusion because a lot of tools will report origin level data. And so I think that can sometimes make it more confusing than it needs to be where people might think that you get like a single score for your entire site. But yeah, but that's not true. You get a score per page in some cases you get a score per page group. If you go into search console, you might see page groups that all have kind of a certain score depending on how much data the site has, you might then not have enough, it might be all pages in the same origin will be grouped together. I don't know, I feel like I'm doing a lot of talking. Anybody else wanna jump in? It's definitely, so exactly as you said, there's kind of this tiered approach as far as availability of data. I do wonder, and I'm asking a question of the group, but is there, I know that this is documented in various places, like pieces of it. I do wonder if we have a single resource that just dives into this specifically. And I think maybe Phil or John might know the answer to that. I don't think we have anything specific to search that goes into all of the details on how we use it for ranking. We do have an extremely comprehensive FAQ though, so that's in the help forum. I definitely checked that out. I don't know if this particular question would be there probably. And if it's not there, we can expand that there as well. But that's kind of, I think the best place to go for a general overview of how we would use page experience within search as a ranking factor. But it's not, for most of the ranking signals that we have, we don't have like one documentation page that says, this is how exactly we use the signal and ranking because there's just so many edge cases and also situations where we need to have the flexibility to adapt to changes that happen in the ecosystem as well. Yeah, I think to get at the spirit of the question, I would encourage people to think about it based on the page. So if you have an important page to your site that a lot of people are coming to from search, I would look at the core battle scores for that page. And even though it might not always be the case in all these certain complex situations or combinations of factors, but in terms of if you just wanna simplify it for yourself, I would look at the scores for that page. So if a user goes to that page and experiences fast, and then they go to another page and experiences slow, that slow pages score is not necessarily gonna affect the ranking of the fast page that's important to you. Okay, so the next question. What is the most common detriment to sites core vitals that also has the most significant effect if fixed? Basically, how should someone prioritize what to fix? This is an interesting question. So I will take a different strategy. I would look at your own site's data. I would look at the crux data first. What are the real users seeing? And then from there, which of those metrics are not meeting core vitals? So maybe your LCP is poor, but your CLS is good. In that case, you wanna look at LCP. Well, then the question is, well, what do you do? At that point, lab tooling like Layhouse can be really, really impactful and effective to show you how to narrow down that specific score. But I would start with your RUM data. What I'm seeing looking at lots and lots of sites every day is that every site is different, but that most of the things the site could change if they took that approach of which metrics are poor, what does Layhouse have to say about those that all of those sites could improve pretty quickly? Yeah, I really like that approach, Annie, too, because you're also then looking at, okay, because you're looking at the field data first, you're going, okay, are my users struggling with interactivity or content loading or is stuff moving around on them? So you're not only, just remember that these metrics are directly linking to some sort of pain point or delightful thing that's happening to your users. So you can be thinking, the reason I bring it up in that way is that if you see, for instance, that your FID is suffering, you can then think about, okay, how do I, what is gonna impact interactivity? And that lens on it starts to help you to kind of just research and dive deep. But yeah, as Annie said, the next step, too, is Lighthouse and as of this week, we have a new metrics filter, too, so you can filter the opportunities by metric. So if you wanna focus on CLS because you see that's what your users are having a hard time with, you can now kind of prioritize what's gonna have the most impact there. Yeah, and since I feel like we didn't give the most satisfying answer to this person, I'll try to kind of give some non-answer. To try to give an answer, which again, Annie's point is the one you should take away that you should look at your own data. But if we were going to give a general one-size-fits-all answer to this question, I would say that we see in the data that LCP is the metric that most sites struggle with. Like the fewest number of sites meet the LCP good threshold and probably the biggest reason that it is is because their images are not optimized. And so I would say if there's one thing you maybe wanna start with, look to optimize your images, that's probably the one thing I would say. Okay, so next question. Is page experience a binary ranking factor? Only good experience pages can get the ranking boost or is there a gradation and how page experience signals affect ranking? One kind of slow page might get a boost over an even slower page. Yes, this is a great question. It is not a binary. Well, I mean, I don't know, kind of. It is mostly not a binary signal. Basically, we know actually probably, John, you're probably the best person into this being on-search yourself. So I think first of all, we continue to take relevance into account when it comes to search. So it's not just like if you're tiny bit faster, then you will rank above everyone else. But relevance does play a really strong role there as well. But within the signal, we kind of go in the area from needs improvement to good. That's kind of the range where we would see a gradual improvement with regards to the ranking signal. And once you've reached kind of that good threshold, then that's for us is kind of like a pretty high bar and you're kind of at that stable point. And at that point, like micro-optimizing things like extra milliseconds here and there, that's not gonna do your site in ranking anything specific. It might have an effect on what users see and with that, you might have other positive effects. But at least when it comes to search ranking, that's not going to be something where you're going to see improvements. If you're like five milliseconds faster than the next one. Yeah. And then definitely a point that I want to clarify is I've heard some people, there's been some confusion about this. It is not the case that unless you reach the good threshold for all of the Coral Battles metrics, like you have to reach that threshold to get a ranking boost. That is not the case. In fact, it's kind of the opposite. Once you reach the, I mean, you will get a ranking boost for reaching the good threshold for all pages. But beyond that point, you don't get additional boost for reaching it even better. Like if you have your LCP at two seconds and you get it all the way down to one second, you know, we've kind of publicly stated it, that will not increase your ranking. However, if you have a very, very slow page, like maybe LCP has 20 seconds and you improve it to 10 seconds, that could potentially boost your ranking. Yeah, we get a lot of questions about specifically the good and like, wow, that's really hard to me. Yes, it is. It's supposed to identify the best content on the web and that you don't really necessarily, we can't say that you need to improve beyond that. You might see additional benefits from your users, but we don't take that into account. Yeah. So I think actually we are out of time. So thanks everyone for joining and watching on the live stream for submitting questions. These were great questions. Feel free to reach out to us over other channels, Twitter or whatever choice we want to reach out to us on. If you have more questions about Core Vitals specifically, again, please go over to web.dev slash vitals. Many of your questions could probably be answered there. Yeah, so thanks to all my co-AMA hosts for joining us and thanks everyone for watching again and we'll see you all again soon. Bye.