 Welcome, everyone, to the Observability Meets a Flaky Test by João Proenzo. And without further delay, over to you, João. Thanks, everyone. Let us begin. So, welcome to SeleniumConf. So, this is my talk, Observability Meets a Flaky Test. My name is João Proenzo. I am a quality engineer that works at Aether Health, a company that works in the health space. I come from Portugal, from Lisbon, where I live, and I've always lived my life. Before I proceed with the story that I want to tell you today, I need to say thank you to an ex-colleague of mine, who is Lisa Crispin. I was fortunate enough to work with Lisa for a year or so, a while back. And when we were working together, we paired a lot on observability topics at the company that we were working on at the time. Just trying to figure out between the two of us what observability meant for that team, and also how we could improve observability inside the company in its various facets. So, it was out of those conversations that I have with Lisa, those prior sessions that the idea for this talk came from. And so, I need to thank her because she contributed to the content that you're about to see. For folks that don't know Lisa Crispin, she is a very well-known person inside of the testing community. She has written books about HL testing and is a leading practitioner that a lot of people follow and admire, such as myself. So, without further ado, let's see what we're going to cover today. I'll start by clarifying a little bit of what observability is. Some of the key, let's say fundamentals of observability. Then, with those fundamentals, I will tell the story that illustrates hopefully the need for observability in the testing context. And at the end, I'll share some ideas of what observability in the story that I've just told you in that context, what it means. So, before we, as a starter, let's talk the difference between monitoring and observability. And a lot of times, when I speak to people about observability, some people will tell me, well, I sometimes get the idea that we just gave monitoring a new name, a new fancy name, but actually we're just doing the things that we used to do before. And I don't think that's true. So, the difference between monitoring and observability is that with monitoring, you're usually dealing with known unknowns. Whereas with observability, you're trying to use data and visualizing data to deal with unknown unknowns. What exactly does this mean? So, with known unknowns in the monitoring side, usually you know something may go wrong. And because of that, there's questions that you know you'll want to answer in the future. And you know that you're going to need data to answer those questions. And so you build dashboards and set alarms around that. So, you know that maybe a specific service may go down and you want to have information about that service that's running right now in production. You want to know every second if the service is up or not. And so you have a dashboard that tells you that. But these are the things that you can predict that may go wrong. With observability, when you walk into the unknown space, what you're basically acknowledging is that some things may go wrong and you don't know about them yet. So, the data that you have already, maybe you'll want to ask new questions of that data. And you have no idea exactly what data will be relevant in the future when you have these new questions that will come up and you need to look at them. Also, you can't exactly build a dashboard today that will answer those questions in the future because you don't know exactly which questions you'll want to answer. So, the whole idea with observability is that you want to have a very rich set of data that will tell you exactly that will allow you to ask new questions in the future, questions that you can predict, questions about what's happening in production or is happening with your systems without having to ship new code. And that's like the key idea of observability. You know you have an observable system when you are able to answer new questions from your data without having to ship new code because you need new data being produced or new ways to access that data. So, let us proceed. So, with these fundamentals in minds, Lisa and I were having these pairing sessions about what observability meant for us, for our system in production, we have a lot of things that we're missing in our context at the time, but then we started looking a little more inward. What about if we don't think about just production but we think about other places than production? And then hypotheses appeared in our heads. And the hypothesis here was that we lacked observability in our CI-CD pipelines. And it's hard to tackle unknown unknowns there. And when we're talking about not just the pipelines themselves, but the automated tests that we are running in those pipelines, whenever an automated test failed, we felt that understanding why that automated test failed was really hard just by looking at the data that we had at hand. But this was just an hypothesis of ours that we couldn't quite prove or we didn't have a good example to showcase to everyone. Until one day a story happened. And that's the story that I want to tell. But before I tell you the details of that story, let me just give you a little bit of context which will be really important in order for you to understand the constraints that we were operating with and even the no constraints that we were operating with. You need to understand that at the time, Lisa and I were working in a product software company that offered just one product, which was a big monolith. We had very complex CI-CD pipelines validating that big monolith. And you also need to understand the scale at which we were operating. So let's go to each one of these topics before I proceed with the story. What do I mean about the monolith? What this essentially meant was that our product, our software, was being offered as a big binary that contained the whole thing inside. It was a 20-year-old product with millions of lines of code, hundreds of thousands of automated tests running over that code. Whenever we produced a build, that build was a single binary that could be deployed in different contexts, could be deployed on premise, could be deployed to the cloud. We still have a lot of things to solve regarding the fact that we have a monolith and start breaking it down into separate decoupled services. But at the time, it was a monolith and that had a big, big impact in the way we were operating. And not just the fact that we had a monitoring with a large code base, but also we had a lot of engineers. We had around a little bit more than 100 engineers working at the same time in that code base, committing code and running automated tests over that code and that huge build process that we had. Now, because we were working with the monolith and we had the engineers, a lot of engineers working, trying to validate the product every single time whenever they committed new code into the product, we had a very complex CICD pipeline. It comprised of several stages. At the beginning, we would have what we call the dev stage where we would run mostly unit tests and whatnot that was really fast to give us a lot of feedback at the beginning. But once that was done, every single code change, every single commit from an engineer would set up all of this and would start firing up different stages of the pipeline that even ran in parallel with different types of tests like performance tests, contract tests, security tests, you name it, N2N tests or UI tests. So the complexity was really, really high regarding our pipeline and the amount of stages that we had and the different types of automated tests that comprise those stages. Now, given this context and given the scale at which we were operating, one of the things that was really, really hard was getting to a green. And when I mean getting to a green, having a build that ran all of the tests, and I'm talking about, if I remember correctly, around 300,000 tests being run for each run. And given that scale, if one of the tests failed, you wouldn't have a stable green build, so you couldn't use that build. So getting to the green, as we called it, was really hard at the time. And we had a lot of processes in place, and we took the pipelines and the automated tests really seriously because we wanted everyone coming together to make sure that all of the tests were passing so that we could have a green, really simple build. Now, I'm sorry, but I'm having a little bit of a hard time. Okay, sorry about that. I was having a little bit of a hard time with the slide, but now I can see it. So one of the things that you need to take into place is that we, because we had so many people working on top of the code base and with these really large builds, it was really hard to have just one run per each engineer. So imagine this scenario. If you have like an engineer that comes up in the morning and no one is really working on the code base yet, she would make a commit of code. Maybe that would fire up a build and the test run with all of those different stages. If one of those tests failed, we knew that it was the code change introduced by the engineer that would have made that test fail, so that engineer could just go on top of that and right away start looking at the failure. However, most of the cases that we weren't in that ideal scenario, because we had so many people committing to the same code base, a lot of times you could have another scenario where like you have three engineers that commit codes maybe to the same part of the code base, maybe to separate parts of the code base of the whole monolith, but those three commits were almost simultaneous. And because of that, that would trigger a pipeline run that would aggregate the three commits. So if one test or two tests or three tests failed that weren't failing before, all of a sudden the only thing that we couldn't say for sure is that these three engineers made the code change that made that test failed. So those three engineers, the three of them would need to go on top of that test to figure out who made the test fail and then fix the failure. Now this creates a lot of different, let's say, social dynamics that we need to handle. And so we created what we called like the rules of engagement, stating that whenever tests fail, that should become top priority for the engineers involved, because we didn't want any bias coming to place like the bystander effect where all of a sudden three people are looking at a problem and each person is waiting for the next person to take it. We would have rules in place that would say one of the three people is elected as the person who needs to figure out which of the code changes made that test fail. And then he should pass that problem on to one of the engineers in the group of three or group of four or group of five once they deem that person made a test fail. We even have had an integration with Slack, so our whole CI CD pipeline had this system that we built that basically detected, hey, we have a new test that is failing. We know that the code changes that made this test fail are this aggregate of code changes and it would automatically open a Slack channel in Slack that would send a message to this group of three engineers and say, hey, we have the test that is failing. Person A will be elected as the person that needs to triage the failure. Please figure out as soon as possible who is responsible for the failure and please fix the problem as soon as possible. This made it much better for the whole team to know tests are failing. Who are the people that need to go after that? Once again, that should be top priority for everyone because we're all trying to work together to get to the green. Now, this was how things work. Everything was perfect, but you know what becomes a really, really big problem when you have such a setup? Flaky tests. Why? Well, if you think about that previous scenario, what happens is if you have the group of three people that committed code and all of a sudden a test fails not because they changed the code or because of any of the changes that any of the three people did, but just because the test is flaky, so it's just like it's passing, passing, passing, it fails, but in that run, the people that are flagged with that failure are the people that triggered the run. Then you have a big problem because you're basically addressing a group of three people and saying, hey, this test failed, you are the person responsible, but then people are like, but I have nothing to do with that failure. That's in a completely different part of the code base. It's just that the test is flaky. Okay. In that situation, we also had some rules of engagement that told us that if the group of people that are looking at a test, if they found out, find out that the test failure most likely is not related to any of the code changes, and if they can detect that the test is probably flaky and the way people did this was looking at our test reporting system that we have that will tell you just for a specific test, how the test runs had been and what the history for the test runs was, and looking at that, people could tell, hey, this test is clearly flaky because it's passing, then it's failing, then it's passing, then it's failing, so you can see that there's no stability to it. If they could come up to these two conclusions, then they had the right to quarantine the test immediately and open a ticket to the team that owned the test because the different tests were owned by different teams and saying, hey, this is a flaky test. We can't have flaky tests in our system even though you'll always have flaky tests, but please deal with the flakiness before you take it out of the quarantine and include it again in our pipelines. And so this was the way we were operating and this was the standard operating procedure in order for us to deal with the flakiness and given the big impact it gave us in getting to the green. Okay, so this was the essentials of the context in which we were operating. I hope that it was clear to everyone and now I will proceed to the rest of the story that I want to tell you. Any questions about the context? Just in terms of if it was clear or not, if you want, just write something in the chat that I would guess not. Okay, cool. Let us proceed. So here's what happened one day. One day, thank you, Palavi, it's clear. Cool. One day I arrived to my desk, I looked at our dashboard with all of the tests that are failing, and all of a sudden I noticed that there was a test user, test user providers that set up admin user and I could see that this was a really strange message. Fail to obtain test suites could not find tests in the module. Now, this test, I didn't make the code change that made this test fail. It really seemed like flakiness and this test was using a framework, a testing framework that I created myself, so I was really interested in it, so I told everyone, hey, I need to, I'll take this test because this test is flaky, and I need to figure out exactly why it is failing. I tried reproducing the failure in my own machine. I have never seen this sort of thing before, and these tests tended to be really stable in the past, so I didn't know what was going on. So I kind of went against the rules and made the test stay in our pipelines in order to figure out exactly why it was failing just on our pipeline machines. But a few months later, the test failed again with the same message, and all of a sudden I had a test architecture to the right and a tech lead that were getting really angry at the fact that we had this flaky test running around, and they were like, another flaky, failing flaky test, we can't have flaky tests around, and then the tech lead was like, it's blocking your beam, we need to tackle flaky tests, so I'm going to quarantine this for now, which was just like, you know, part of the standard operating procedure. And then how you appear, this is supposed to be here, here on the right, so I was like, I'm sorry about the flakiness test, people, I'm trying to fix the flakiness around these tests, but we've been trying to catch the test failing because we can't reproduce the test, so that's why I'm running the test a few times in our CI CD pipelines, even though it's failing. And then the architect was really angry at this, he was like, do we really need to have this enable to analyze it? We can't reach a green like that. And I was like, well, I can't know what's going on without accessing the only machine where it fails. And of course, the tech lead was like, she was being empathetic, and she was like, I can understand how that's tricky, just make it quick so that we can quarantine the test, you know, and the architect was already thinking, there's got to be a way for us to improve this somehow, but I don't know how just yet. And so they allow me to have the test running a few days more, but a few days later, the test was still failing. And the architect was now really, really angry, like, the test is still failing, this is not acceptable anymore. It's the end of the line, please take the test away. And I was really frustrating because I was waiting to see the test failing before my eyes. But for some reason, the test only wanted to fail at during the night, like at 11pm, when everyone was already away, and just problems with just running on a night rebuild or something like that. And then they were like, all of a sudden, I was like, hey, wait a minute, the test just failed. And this was like during the morning, I can finally check the log. So this is exactly what I wanted. But the architect was like, we need to shorten feedback loops. Why didn't you access the logs before? And this was really important. I had a problem, which was the machines where the tests failed, they were being fired up during the morning, they ran tests throughout the day, but then at the end, because of costs, we would tear down the machines also because we didn't want any garbage being left in the machines in case some tests were being bad, bad behavior. And so with the machines being deleted, we also lost the logs that allow us to look at the test failures. And let's say the specifics of the test failures. And this will be something very important that I'll come back to later on. And then the architect said something to the likes of maybe you should output the relevant logs in the test report itself. This is a very, very interesting observation that you will also be coming back to later on in the story. But okay, let's aggravation around. Aside, I put the test in quarantine. No one was angry at me anymore. I had the logs, the tests that failed, and I could finally look at the logs and figure out what was going on. Now let me just tell you exactly what I had to do in order to figure out what was going on with that failing test. So the failure message wasn't telling me much, just hinting that something was wrong with the framework that I had created. So, okay, do you have any logs about the test, like the main specific logs about the test? So I went into this web console where basically I looked at a number of logs. The logs were not there because the test module had been deleted. But I knew that even though the console showed me a lot of logs, it wouldn't show logs for test modules that had been deleted. But the logs would be in the database. So I did a remote desktop connection into the machine that ran the tests. I hopped on to another application to get some configuration so that I could use the beaver to access the MySQL database where that was running and check the logs. And all of a sudden, whoa, I have lots of logs there, way more logs than I can see through the web console. So how am I going to filter this out so that I can look logs that are relevant for the test that was running? Okay, what time did the test fail? So I go to the CD, which is the CI CD pipelines technology that we were using. And now I have a timestamp that tells me whenever the test failed. So I can use that timestamp to go back to the beaver and just filter out the logs for that timeframe so that I can look at less data than looking at a lot, a lot, a lot of logs. Okay, cool. I just found a error message that tells me that the test framework failed or crashed when the test was running. Cool. But in what situations would this happen? I mean, the test framework has been stable up until now. It's open source, a lot of people use it. Let's look at the code. So I fired up my ID, started looking at code. And then I found out the only situation where this may happen is when a setup timer has not run. And that is like an asynchronous process that usually takes place as soon as you publish the test frameworks to start using it. It shouldn't happen, but it's possible that if things happen really fast, you have a concurrency problem and then all of a sudden tests failed because of this problem. The only way I can check this out is did this timer, this asynchronous process fail or did it run after the test? So I went back to the beaver, looked at different things, and the timer ran after the test. Okay. So now I knew why this was happening. Basically, we had this asynchronous, a synchronous procedure that was needed. We needed to run it before any test run. But because things were happening in a very special sequence, tests would try to run tests before the asynchronous procedure ran. And because of that, the test would fail with this message. And this only happened in a few of the runs that we did. Okay. If what I've just explained to you in this really stressed and really fast way was hard to keep up with, let me just tell you, it doesn't really matter. The only thing given this whole double saga I had to go through to figure out why the test is failing. The only thing I want you to tell me is looking at all of the things I had to do, what was one of the things I had to do a lot of times in order to figure out what was going on? Does anyone have an idea? Please write in the chat if you have any idea. And I'll wait for a few ideas from you. Add to look at the database a lot of times. More ideas. Then just give you a few moments to see if I can get any other idea about things that I had to do a lot of times while scrambling around to figure out exactly what was going on. Okay. Gather logs. Yeah. Looking for situations. Checking logs in call service logs in one place. You're getting there. You're getting there almost trying different situations. Sure. But the thing I'm trying to get at folks and all of those are important things was one of the things I had to do a lot was hop, hop, hop, hop, hop, hop, hop. I had to hop between different technologies and different tools in order to look at different data, different pieces of data at different places. I had to actually hop one, two, three, four, five, six, seven, eight times in order to get to the conclusion that I needed. And to be honest, I was on a mission. I was on, like, this was my test framework. And I needed to figure out what the hell was going on because I wasn't letting my test framework be a source of flakiness in our pipelines. But most of the engineers at our organization, they wouldn't go through all of this trouble. After the first hop, sure, they would go hopping from the test failure into the web console for the logs. Maybe they would try to go into the database to check a few things more, but they would stop there. They wouldn't go through all of these hops to figure out what was going on. It was just easier to just hit the rerun button and see if the test would pass on the next run. You know why? Because hopping means friction. Every time you need to hop from A2 to another, that's just friction that you're pulling into. You're putting into your own process in order to achieve your goal. And that causes a lot of problems from a developer experience or from a troubleshooting experience that then doesn't allow you to figure out quickly what's going on. Now, imagine that the story could have gone completely different. Imagine that we just had these different things. All logs in one place, and I see people in the chat that's already hinting at that. Of course, that's really important, having all of these different logs in just one place. And I'm not just talking about the logs from the domain-specific logs, but even like the CI CD logs for the runs or even the failure messages, all of those different things. Imagine that you had data longevity, and that was a big problem in this example, that we weren't keeping the data around for longer than just one day because the machines were being deleted. Imagine that we had tracing information that allowed, that made the data much richer and allowed us to correlate better what was going on at the time when the test was running because that was really important. We had to look at something that was happening on the side to figure out why the test was failing, actually. Imagine that we had a way to explore and navigate this data way better than just different dashboards and different tools just to run some SQL queries. Probably what would happen would be like the tech lead would be like, hey, this test looks really flaky in the past week. I would, okay, let me look at the logs. I would navigate them. And a few moments later, I would just say, oh, I know what's going on. I'll fix it. Basically, I just need to make sure that no test run before this setup is run beforehand. The architect wouldn't even be involved. No one would be aggravated, and it would be really, really easier to deal with the flakiness itself. And if we go back to the conversation, there are a few things with which I want you to think about which are really important. Remember when the architect said, maybe you should output the relevant logs in the test report itself? This is monitoring. This is not observability. When you say that you're going to output the relevant logs in the test report itself, you're saying beforehand while creating or changing the test that you already know all of the data that's going to be relevant for that test. But we were dealing with unknown unknowns here. No one had thought of the problem that the framework has an asynchronous process that maybe is not running at the right time, and we didn't detect that that could happen. I had to go into logs I haven't predicted before that I would need in order to figure out why the test was failing. So this whole idea that, hey, it doesn't matter if logs are available or not, you need to do your homework and include the logs in the test itself. That is wrong for the observability perspective because you can't predict all of the data that will be useful for you in the future. Also, you shouldn't need to debug tests in CI environments. This was also an idea that was behind all of this way of thinking, but to be honest, that's only true if you can look at your test failures in another environment, which a lot of times is not possible because when you're dealing with flakiness or even with legitimate failure, sometimes they only fail in specific contexts that only happen in CI, and or you either are able to reproduce problems in another machine or you have all of the information that you need. You're practicing true observability. You have a true observable system, and then, yes, you can kind of like borrow people from debugging stuff directly in production in CI. Proper automated test observability eliminates the need for reproducing failures, but we can't ignore all of the things that we need in order to reach this automated test. So let's do a small retro about the things that we learned from that story. A flaky test, you don't expect it to fail. It's not like we're building our tests, our automated tests, and then saying, hey, this test is flakies, but I'm just going to get them to my pipelines. Usually in most situations, you build a test, you're confident that it's pretty sturdy, but then you start learning from CI that it's flaky and you didn't expect it initially. And usually at the beginning, you have no idea where the flakiness is coming from. So the ability to ask questions that you have not thought of before is very, very important. In our specific case, we couldn't ask questions because we didn't have the data for the day before. So a lot of times we would see ourselves in situations where I need the data and I don't even have the most basic data about our my test failures. Data longevity was a big problem and that was one of the first things that we need to fix. Also, the fact that the data is scattered just makes it really hard to go after these problems because a lot of people don't know what data is available. They don't know how to access it. And even if they do know, the process of going after it is so cumbersome that we start just thinking if it's worth it or not. You saw me going through all of those different steps because I had a lot of knowledge over the whole system. Not everyone would have been able to do that because not everyone would know all of those different details about where the data was scattered. And hopping means friction, as I've said before. So what did we end up doing or what are the things that we should do? Of course, we should send all of the data to the centralized location and we started doing that. We started seeing all of the data to the data like so that we could look at it. Not just the test results, but also the go CD logs because that would be very important for us to have timestamps and understand which tests run before, which tests run after. Does that play a part in this? Also the main specific logs from the test machines themselves. So get all of those different different log levels into just one place so that you can correlate. Introduce tools to properly navigate that data in the centralized location. Think about that exploration is really important here. Not just having a predefined dashboard that you believe will tell you everything beforehand. And then, and this was the part that was harder that at the time I was in that company, we didn't get to that. We also needed to instrument the tests and product codes to get richer information to explore. That would allow us to barricade it to the use case of understanding what's happening in the machine while the test is running. And circling back to the fundamentals of observability, if you think about the story, I think that it relates nicely into some of the things that I've learned about observability in the industry. Whenever I want to learn, let's say, I'm not like the biggest expert in observability in the world, but the things that I've learned about observability came mostly from two sources, from two people, from two persons. One of them is Charity Majors, and she's the CEO of Honeycomb, which is one of the leading technologies for practicing observability in production. Sorry, failing voice there. The other person is Abby Bangzer, and she's a friend of mine. Abby came from a testing context and an automated test, and they're slowly progressing to observability, and she has been talking to the world and to the testing community about observability in different ways. And there are two key pieces of content that I want you to take home with you, which is one of them is observability, a three-year retrospective, which is a blog post from Charity Majors. And then the other is a talk from Abby, a journey to truly understanding observability. And in both, they talk about what are the fundamental characteristics of an observable system. Now, these fundamental characteristics, there are a lot of them that you can group in three big buckets, which are the ones I want to focus on. The first one is the only way to ask new questions is to keep the original raw data available and queryable. And that ties in nicely to the story I've just told you. You can only ask new questions if you keep the original logs that you have for the tests, and you can still query those logs. This means, in their view, having raw events and not pre-aggregating data have the raw data available right from the beginning and keeping it around for a while so that you can go back to it and ask new questions. The other key characteristic is empowering creative and shared exploration based on business context. In our case, the context here is testing, and the creative and shared exploration ties into having those tools that allow you to explore the data in different ways rather than having static dashboards that tell you everything from right off the bat because it's basically impossible to do that when you're trying to practice observability. And there are also small things around this, oriented around the life cycle of the request, which makes sense in our case as well, and batching up the context like having access to the context of when things are happening. Finally, it's also really important to make data easy to add details to and easy to query. Now, I didn't go too much into this, but this was actually something that was also a pain for us. What this means essentially is that if you see yourself in the situation from a development perspective, a developer experience perspective, that you want to have more data to the system because you believe it will be useful for the future and future questions, adding that data should be a frictionless experience as well. So you should have structured data. You should make that data as wide as possible. You should be able to add as many details as you want to that data. You shouldn't have really strong schemas that just tell you that you can only have these types of data in these places. Now, you should have unstructured data that allows you to add new details and new structure if you want to, and make the data just really, really rich. And just something that we identified we needed to work on, not just from a test automation perspective, but also from a production perspective when it came to our observability capabilities. And if you look at all of these three different G aspects, this is actually a way that the yellow one, the blue one, and the purple one, these are actually ways or these are actually fundamentals that you can apply to your own context. When you think about your test automation and what you have to go through whenever an automated test fails, how much of these three different things are possible in your test automation system? And if any of them are being constructed or limited by the constraints that you have, what are the next steps that you can take in order to have better data, more data, data around and ways to explore that data that will allow you and your organization to be much faster and much more effective in figuring out why your test fails or why flakiness happens, for instance. So as a key takeaway, observability, it's not just for production. And for observability, we need explorability. We need one data store where all of the data ends up in multiple views over that data store and ways to achieve that. This is a direct quote from Abby Bangzer, which I've just introduced there. And to analyze test failures faster and learn more about the changes we're making in our code, observability is the future. I truly believe it's a big part of the future of test automation, and that's why it's really important in that context. Sorry about that. This slide shouldn't be here. Thank you so much. I'm from Proviso Quality Engineer, and I'll be taking questions now if you want me to. Thank you so much, Joao. It was a really nice talk, nice presentation. So I see a few questions in Q&A sections. So first is we normally have future exceptions, stack trace, and screenshot log when we run automation. What else we can improve on that? So I see from Harshal that he's talking about filler exceptions, and stack traces, and screenshots. Log, I think that data usually tells you what happened from a testing perspective. About the software that we're, let's say, testing itself, I think that we should also have visibility over data over that. So you don't want to just have logs about the test themselves, but also about the code that is being tested directly. So having observability over the code that's being tested, I think that's valuable as well. Also, I'm thinking that, of course, that we have screenshots, filler exceptions, stack trace. There's a lot of information that we can also collect about the test themselves. Like what context did they run in? Were their tests running before? Are their tests running after? Could it be possible that tests before left the environment where we are running these tests in the state that causes new problems? Or maybe, like I showed in that case, what other things are happening in the system at the same time that a specific test was running? So having the filler exception stack trace is really useful, and the screenshots as well, but definitely can go way beyond that. Also, having historical data about the tests themselves, which maybe Arshul has that, but it's important to mention, it's really important for us to figure out exactly if a test is like, if it's not, looking at the history usually tells us a lot about the nature of the test. I hope that I answered what Arshul was looking for. Next question, I think this will be the last question we'll be taking, and then we will move to the handout's table. So this is an anonymous attendee, I think. I faced a scenario where my tests are passing on local machine every time, but failing on pipeline. How can we deal with such scenarios? So are passing on every time that's failing on pipelines? How can we deal with such scenarios? So that was like, that was like the that was basically the situation where I was at at the beginning of the story, if you think about it. It's like, if they fail in pipelines, I mean, I think that you need, when you don't have observability yet, I think that you need to look at the test failing, see what information you have available over that test failing, which usually if you're in that situation, that information won't be won't be enough. However, you can think, okay, I don't have this information, but what are the next pieces of information that I would like to have about this? And maybe you want to have more data into the testing itself, or you want to have more data into the code being tested itself. And then you just need to follow those breadcrumbs and say, hey, when this test fails again in CI pipelines, I would like to have this data available so that I know what's going on. This, of course, if you can't go directly into those machines or those environments and run your tests in the same context, so that you can check firsthand what's going on. But that's usually the way I go into it. If I don't have enough data coming from CI that tells me what's going on, then I just do whatever necessary or talk to whoever necessary to make sure that the next pieces of the breadcrumbs that I that I'd like the questions I'd like answering, the data that I need for those questions becomes available in the code so that the next time it fails in CI, I can look at that data and start kind of like solving the mystery. Okay, great. So I see a couple of more questions, but we are out of time, so we'll have to end this. And thanks, Joao, for sharing your experience with us today.