 Thanks so much. Yeah, so today, we'll be talking about cultivating performance mindset. My name is Nara Sainarath The reason why I wanted to talk about this is because I've been a new installment on the performance team at Sentry and I've learned a lot of things along the way that I really want to show to other people and just kind of make Look performance is a hard problem and looking at this stuff requires a lot of Investigation and I think just getting started in this space is very difficult So we want to kind of like bring it back and make it a little bit more approachable Talk to you guys about some of the concepts that we can see here So a little bit about me My I'm a software engineer at Sentry. Again, I've been put on the performance team That's our like little dashboard homepage place But there's a lot of stuff that we're doing for for you guys here For those of you that don't know Sentry their bread and butter is error error monitoring so you install an SDK if your application You know fails to compute something throws an error an exception We send that to the platform so you can aggregate it and take a get a lot of insight into Where your applications failing and how to fix issues What we're trying to do is we're trying to make performance as easy to solve as errors because a lot of that is very difficult to trace trace back to a single point of failure When as I've been here, I've been working with a lot of like really really strong performance it oriented engineers and I've been amazed at like the skills and reasoning that they've used their tools And we're going to be talking about stuff that I've learned and how you yourself can start thinking about Implementing performance in your application in such a way that it's proactive. So you're always prepared to tackle some of those issues So the agenda for today is to talk about why performance matters and How you can easily start tracking performance today along the way We'll also be talking about sentries features and how we're helping out build that stuff So the first topic is why does performance matter? I like to answer this in one of three questions and Hopefully you guys will feel some kind of relation to some of these questions Number one is like do you care about your users? so we all build up features for our users to bring value to them and Performance is a huge consideration as your application gains a lot of traction in terms of scaling once you start Getting more data in in there Users are going to be left out of being able to take full use of your services if you're not able to scale and perform to their needs So think about a time where you've had to wait for a loading spinner And how did that make you feel did it make you consider like looking for another product? You know for us as developers we can look into like the dev tools and figure out what's exactly going on You know, we usually find sentry in there You can see if an error is like 500 or a request is aired out and Or you can just see if it's still pending, but you know, you're typical everyday coders I mean everyday non-coding friends aren't really going to be able to do that They'll look for someplace else which brings me to my next question is, you know, do you like money? Because I do I do I like money But there's a lot of contemporary research right now where for example Shopify paired up with this company called BCG Boston Consulting Group and They were able to correlate performance issues with lower Decreased lower funnel conversion, which basically means customers who are ready to actually purchase a thing whether or not they ended up purchasing that thing They found that if you had over 90 seconds of time from adding to a cart and checking out you actually lost about 50% of your users there Another cool place to see these this kind of like contemporary research is a website called WPO stats So they highlight and showcase a lot of different case studies where people have Instrumented or they've also like dive deep into their data to figure out their performance issues and how they fix them It's a huge art. It's a skill. That's you know, you're working on lifelong So those are some links down there below I'll be sharing these slides later on and My final but like favorite question because I stress about all the time is do you like to sleep at night? You know without having to worry that your big customer is going to take down your application by using stuff too much You know how many of us here actually do have that one customer where we know that Like they're the stress test right we might have some data and stuff But if they get their hands on it, we might be in for some problems And you know, you might argue that we could be able to Account for this ahead of time, but sometimes we can't we work with like very complex systems We need to make sure that we get insight into our applications I've been unnecessarily stressed before trying to figure out months after my features were released trying to figure out Okay, how do I make this faster? But there's nothing there keeping me on track So does performance matter I think it matters to all of us. We're all state code holders It matters to our users to your company to us users want you know to get the full benefit of what you're providing them your company is relying on you and It matters to you to instrument your application So you're not stressing out when it comes to actually handling all the other stuff in your work So because we're at century We have two kinds of things that we really look into in terms of like application degradation So we see that there are errors and there's also performance issues I want to highlight a bit of a difference between these errors are a very pin point event where You can get a stack trace and usually you can walk back up the stack trace to find the smoking gun And that's that's what makes errors like very very easy to solve but performance issues are usually a regression over time and you don't really know when things are gonna get bad, but You know, that's why it's really important for us to build out tooling to help us track down these issues and navigate into The the most efficient path to finding the solutions and you could fix all of the errors in your application But if you have a memory leak or you have some other performance issues in your application that Users that eventually will find out about It could be like this, right? You know, you might be like, oh, this is fine This is fine, but if your users aren't reporting stuff to you everything's on fire It's a mess, right? So ideally the more that we can think about doing this stuff more we can be efficient Hopefully we can bring it down to this So to simplify the process we've kind of broken it down into four different steps and These steps are to collect data to use that data to detect something you can make a fix and then to monitor Today we're gonna be focusing on the collection and detection of data a Lot of that is kind of like where things are becoming very Complicated and as soon as you can detect things you understand the problem space you can actually think better about how to make a fix to You can deploy a fix and you know read the benefits of that as well The entry point into the cycle is to collect data. So just to start there How do we what kinds of things can we collect and the kinds of things that we can collect our metrics? Traces and profiles all of these are going to be very very helpful at different stages during the performance investigation We'll start with metrics because they're the most simple and we'll move down into the very fine grain profiles So at the bare minimum metrics are just a number Usually we try to correlate them with timestamps because that gets us to do stuff like plot them on nice graphs And we can see how things change over time Some very common Metrics to look at our transaction. Sorry throughput some duration and CPU and memory stuff So this all gives you insight into at a high level like how is my application performing? The next thing that we move into are called traces So the thing about traces is now we have to consider. They're a collection of what's called a span Before in metrics land, we only had timestamp and potentially One a number but now we have a start and an end timestamp in this case these are usually given a name they record some kind of operation and There you can find them related in parent child relationships So these can be grouped together and represented in a way that's very easy to understand like what's going on in your application and You can see an example here where we have like a Request that's trying to get some inventory and eventually we go down into the database level So this is giving us information about how our applications performing if you've ever worked in the front end like you might have seen this in network tabs in your browser dev tools an Interesting about traces is that as you as you build out your system and you gain more and more services You can actually connect traces. So then you get a full end-to-end Understanding of what your application is doing. We call this a distributed trace Usually what happens is you have a trace ID that's propagated across your your requests, but then you get Anytime you want to understand The timing in your application you can dig deep into that here. So this is the most powerful form We find out like exploratory performance data Yeah, amazing to do it if you can instrument all services in your application to work well with each other It's the ecosystem, right? It's why a lot of people have Apple watches and Mac books. I don't know but The the next thing we have is profiles, this is the most granular type of data that we can get and They're similar in a way to traces But instead of showing activity over time of some kind of operation with some name These are actually going to look at stack activity. So we'll also typically because we're low level there We're looking we're getting function names. We can also surface CPU and memory data as well And this is the best option for looking for code level insights to your application Bear in mind, you know profiles because we're digging deep into the code there There is some consideration on where we're going to be using these tools and how we're going to be using them So the the team that was working on the profiling stuff at Century They try to keep a threshold of one to five percent CPU overhead now That's going to be something that you need to determine is acceptable or not for your application But it is seriously very very powerful if you can dig as deep as that to find out your issues We can take a look here. This is an example of what a profile looks like this is Represented in what's called a flame chart. You can actually invert it and it looks like flames it's pretty cool, but you can see as time moves on you can see different function calls being made and Things are returning all the way up What we've highlighted here is get products and you can see that there's a connection being made and some iteration This may give you some ideas into like where you want to look into your code So now we know like there are some things to collect How do we actually collect data the way that we collect data is that you're the easiest way is to look for some kind of monitoring solution and You know century. Hello, so We have lots and lots of different kinds of SDKs that you can look at You know, we're here at a Python conference. So great to plug plugs and Python SDKs We have about 23 of them for different libraries and stuff that we can look into Very very basic ones are you know our Python base Python You can get Django fast API and flask. We try to hit like a lot of them big players in the ecosystem This is making it a lot easier to Send data and look at it in the same place The API is Very very easy for for us to use Rather than only trying to instrument this stuff like manually and a lot of good things that come with using a predefined monitoring solution Is that our SDK developers do a great job at automatically? Instrumenting what's useful to people which gets you leagues beyond, you know, just starting to manually put things into your application for example, if you're using Django, then Any time you operate with the Django or M will record those database operations for you So this is a little bit of a code snippet about what it's like to custom like add some custom instrumentation around your code and we have a little bit of eating pizza operation here and Maybe you want to find out like how much it takes to like eat your pizza well, you can start this transaction and what a transaction is is just a trace that we're going to ingest and we're going to pull off some metrics from it and This is all you need in in Python to start recording something Because what it'll do is it creates a context and once you exit the context the the data gets sent to our servers So at this point you might be asking okay now what and I mean that was me like about a few months ago as well But at this point, you know, we know how to start getting data. We know how to start Collecting it what we need to know is like what's important to us So we're zeroing in on how to use these tools one thing that's like really important to be able to do is Align yourself with what matters most to you and usually that's whatever matters most to the company's goals at the time so you can look at thinking about service level objectives or SLOs these are kind of like Numbers or or a range of numbers as we have there About like what you want to hit to be able to be successful for your users and an example here is like if you have an endpoint and you say I need it to be less than 500 milliseconds because otherwise it won't You know, I don't feel confident in having this as a Customer experience then that's your that's your metric so these SLOs guide what are like thresholds for you and It's really important to consider what your thresholds are because you can only make performance improvements once you back them against a certain baseline These can be anything you want There are industry standards for certain things and if you can find an industry standard for What kind of metrics are important to you that would be The best way that you can kind of handle this but sometimes it's not actually where you're at so an example here is you in the front end we have core web vitals and Google sets these out. They make certain they do a lot of research and You can find out that There's a standard for something. Let's say that's called largest contentful paint, which is when your web page finishes loading like the largest Item there and if that doesn't load within 2.5 seconds your user experience is actually Considered a little bit like less less great a tip that we have here is you can actually set your threshold slightly above your current behavior and in doing so you'll be able to Make sure that you're keeping an eye on whether or not things are getting worse So without any industry standards, you can kind of do that as well and then shrink from there So moving into a few a few tips that we have for collecting data One of them we have is I don't miss the force for the trees So this means that we need to actually instrument our application at to get the full insight into our user experience at the highest level that they can Because it doesn't actually do our users too good if we are instrumenting at a lower level we're not seeing the full picture of the latency between our services and An example is you know, let's say you want to implement your front-end app. So if you're only implementing Instrumentation in your back end you might be missing out on a lot of like a lot of different wasted time in the front-end Something as well is commits are free So I mean this this definitely assumes that you have like a really really regular deployment cycle But if you're investigating performance issues and data You should feel very encouraged to start tagging more data collecting more information as you understand more and more of What's going on because we don't know everything from the beginning and we'll be able to find ways to slice and dice our data and figure out what's How users are moving throughout the code to cause these issues because again, we're thinking about these this stuff in systems So some of the questions we can ask here What users are hitting this code path? What kind of plan or tier are they and what kind of parameters were involved in certain requests? Because that usually dictates like different conditions and different code paths The more you ask these questions the more you can start getting more insight into your code and Another thing here is, you know, there's very like common bottlenecks and pitfalls that people have So if you're developing a feature where you're handling IO or queuing up tasks or doing a lot of like serialization and deserialization those are places where you really want to kind of keep a lookout for so make sure that those features are well implemented well instrumented and You know just to kind of go over them the reason why these things are slow is because file IO to cross that boundary requires a lot of like code level or operating level steps to write to the the disk and If you queue things up depending on how you queue them up and the number of workers and things you can get insight into whether or not You're wasting some time waiting for things to finish and We've seen before serialization is deserialization negatively affects your performance by Putting a higher load on Taking taking data and putting it into classes. So you may want to consider using more simple implementations like name tuples or data classes Okay, again, so now what right? So now we know we have some data. We know like where we want to look at some data How do we make something? How do we get some use out of it? one way of doing so is that we can now use it for detection and to detect issues there are a range of techniques and they span from reactive to proactive so Starting with the first one, which is a user complaints user complaints are like the most reactive approach But they're the most easy. You don't have to do anything They'll tell you but this is this is the best signal that you can get from your users from a use case because They're telling you something is not working for them That's golden the downside of this is that it's not very consistent and sometimes users aren't actually very good at explaining What's going wrong? Which is why we need to rely on the data here. This is why I like data data data is very very important An example is if your users so happening to use your app when they have a slow network connection They may think it's like certain different causes in your in your code But moving into like more of the data side of things now that we're collecting a lot of stuff We can actually start setting up alerts and this is where like those thresholds come into into play as well because you can actually zero in on different kinds of metrics that you're looking at and correlate When certain regressions occur to different causal events again here This is where I say a tip, you know if you're monitoring your application for the first time start with just setting your thresholds a little bit higher Perfect, you'll know if like things are getting slower or not Here is how that alert like shows up in century. This is how we set it up So we might be watching our metrics over time. This is a duration for a checkout request But you can see that there's a spike and we can set different levels from green which is good to Warning which is yellow and then critical which is red and at each each stage once we cross these boundaries We can set notifications because we need to know Maybe we don't really care if there's a warning But if it gets so bad that you know, we're skyrocketing then we need to know we had to drop everything and look after this one one thing that One use case for this as well is like throughput if for some reason your system stops sending any events Maybe your system is down and that's something that you really want to keep track of here as well The the most proactive in terms of like looking at this data is going to be looking at traces and profiling and What you're doing here is that you're doing a workflow where you go and you look for different ways to cut your data into different segments and find out which users and which If you're adding data to your events, we call those tags you can find out which tags are actually contributing to the slowness One thing to note here is that not all seconds are equal So we might have two kinds of operations and what we mean by this is We're looking to make the most impact. We're lazy people, right? so the the impact that we want to make is going to be from Looking at where time is spent but also knowing that this is being hit often and This because if you have for example like a full data export That's only maybe hit maybe once a month then that's not going to be as effective even if it takes an hour Even if that takes like 30 minutes or an hour, that's not really that important to optimize But if your authentication is taking about like two or three seconds You're wasting so much time there that your users can be doing to get to the end And one thing here is we also have Sentries building out a lot of automated performance issues So this is probably even be on proactive in a sense that your tooling is set up to highlight things for you So one of the the ones that people really like to see is n plus one issues We're able to detect from the incoming performance data that your application is making this kind of like like linear Request to the database where you get a bunch of items and then You're making smaller requests down the line, but we're always trying to figure out more things So there's like we've really slow DB queries consecutive DB queries and also large HTTP payloads if you're communicating over servers and just getting a lot of data to again serialize and deserialize so once we've identified some stuff as potentially actionable we need to know like how how is it actionable? This is kind of like the hardest part It requires a lot of exploratory exploratory work. You're gonna want to actually look into your data and You have to kind of compare Good example use cases and bad example use cases. So that actually is very important in Understanding how you can like slice and dice your data Because once you can compare it then you can like zero in on what the changes are it makes it make those fixes as well In this case, we're looking at a high-level metric. This is a chart of an Endpoint that's trying to just get a bunch of products We may ask yourselves like okay. Why is this thing taking six seconds? It's it's stable So maybe this doesn't actually show up as anything interesting to you But knowing contextually that we're just listing a bunch of products here It might not be that interesting or it might be interesting why it's six seconds So you can actually go into some of the events here and again, we're going from metrics down to traces now We're looking at what is happening during this request for a user. So you may look at a few and you may see that Again, we have this of a string of database calls, which is actually taking time because they're all serially being fetched Potentially we were forming forming a hypothesis here where we're like, okay Maybe that's something that doesn't need to happen. Maybe you collapse those or we also see that there's Some iteration happening at the end that's taking four seconds. We can again start instrumenting more understanding what's happening in there and Move forward towards fixing a solution After that you can fix you use to find a fix There's there's not too much that we can actually say about this today because it's very dependent on the context of your application but we have a lot of Common solutions like you can add data caching to your endpoints that way you can make them a little less expensive N plus one queries are very easy fix and if you wanted to implement improve like Search you can add an index or there's always like looking into like improved algorithm runtime, which is a whole book in and of itself and of course in and of itself once you've deployed your fix you're going to have to monitor the solution and After that you're not going to be doing too much different But you're going to be looking at the data just to confirm and you know if it ends up being true It feels so good to see that you know that line drop down to be a lower level and be stable as well Along the way you've probably added a lot of data collection that you don't really need but sometimes you have stuff that you Do need to to keep because you might be important for you later on just clean up here after yourself because Most of the time that that cost money it adds complexity to the code and a little bit overhead as well in an ideal world We want to keep everything, but we can't So just to kind of recap like there are some certain century features We've seen that helped make all this stuff a bit easier Things like setting up the your SDK to send transactions creating alerts is a very powerful technique to know when things are regressing and Using the different pages in century you can actually look at different durations and Ptpm is your events per minute or transactions per minute is what we call them and you can correlate where your your impact is going to be the most and Trends is something where we can actually highlight different transactions and when they're getting worse So you can go into the ones that are regressing the most We also have suspect tags and spans which if you're adding data and stuff We're showing you the most expensive operations during a particular endpoint and One of the the most interesting things is we're automating a lot of stuff for you So we're again. We've got some n plus one queries. We got Those like db queries, but also we're looking at adding some kind of statistical analysis as well That's going to be coming down in the future, but it's gonna be very interesting And that's that's it. That's my time. We've got a booth. We're over by the forum hall entrance I don't know if we have too much time for questions, but please stop by if you're interested in what we do. We'd love to talk Thank you very much for the great talk. We do have time for two or one or two questions There's a microphone over there if somebody wants to ask a question, please hurry and then we'll maybe get one or two I think we need one Okay, your question, please. Yeah, hello First of all, let's say that Century is real lifesaver. I'm not getting paid to say it But at least at any moment in my company, there's probably two or three people looking at our century and trying to decipher what's wrong What's really anticlimactic though is going to tracing and seeing this onerous missing Instrumentation any advice how to Make it not show up like so much because we are really Stunned why does it happen because we don't really do crazy stuff in our code that should result in such thing That's a good question There is a feature in century where if you are sending profiles, which I know is a different thing But we're also able to capture a profile and show you within that missing instrumentation Even though we don't have transactions or traces The the code that's actually running during that time that might help inform you To add more custom instrumentation there what we can talk later about potentially why there might be a missing customer Missing customers. Thank you. Okay. Thank you very much for the question and as there are no more questions at the moment I'd like to thank you again for a talk. Let's have another round of applause. Thank you guys