 Everyone thank you for coming to our talk. My name is your own Schneider I'm the CTO and co-founder of a company called diagrid and we are Commercializing the dapper open-source project which I helped create while I was at Microsoft before diagrid and to greater with me here We have I'm having a meeting for diagrid and we're happy to also maintain of dapper. Yes. We're both maintainers of dapper I'm a steering committee member of the project Arthur was a steering committee member of the project in the past We're happy to be here and help you talk about how we do quality control and testing In what we call a multi-cloud runtime So what is a multi-cloud runtime dapper is an example of one But it might just be any sort of application that is deployed to a multi-cloud environment needs to talk to Multiples of you know services cloud providers libraries they interact with the underlying cloud infrastructure It's essentially anything that can run in a cloud environment needs to interact with either multiple services Within that cloud environment or across cloud environments And so this comes with lots of challenges because you need to start writing You know different layer in your application to interact with the infrastructure to secure it to make it reliable and There is lots of you know examples from that let's take Kubernetes for example Kubernetes is obviously a multi-cloud runtime because well it interacts with the cloud infrastructure Kubernetes has a cloud provider layer that interacts with things like network compute and storage from the cloud provider Upon which it runs dapper is a really good example because it does over a hundred and fifty different connected connections to all of these cloud services and We really look at dapper and we're gonna talk about how we're doing testing in dapper today But I assure you you can take all of the lessons here to your applications whether they're using the app or not The app is a set of API's for developers to encapsulate distributed systems challenges Instead of you needing to write the same old border plate code that needs to interact with a database or a pub Sober a configuration store or a secrets management store It dapper gives you these best practices encapsulated in a set of API's that you can use to run on top of any Clouder edge infrastructure it can run as a single binary on your machine on the VM Or on Kubernetes with a control plane and you can really use it from any type of program language And this is essentially what dapper does you have your application here on the left You have the dapper instance can really run anywhere on the right and you have a bunch of API's the dapper exposes For example to discover services or to get state and save state To publish and subscribe to topic messages and do secrets management This is for API's out of nine that dapper has including leader election workflows as code kind of like a temporal kind of thing and behind all of these things there is a wide array of different infrastructure components that you can hook up dapper to So for example, if you're running in AWS as a developer you get a single consistent API You can basically tell dapper. Hey dapper. I want you to work against DynamoDB Then once you deploy to a different environment for example, let's say we're deploying to OpenShift On-premises you can tell dapper. Hey, I want you to work against Cassandra And if we're deploying to Azure, it can be Azure Cosmos DB, Google Firebase You get the just these are just some examples here on the right But as I said earlier a dapper supports over a hundred and fifty of these components. This text is outdated should change that And so new components are getting added by the community as the community grows and dapper really is a very large project in CNCF I think we're the 12th largest project today Lots of companies are investing in it contributing to the different cloud connectivity components that dapper has and We need to assert control over that and we need to make sure that everything is properly tested that it's you know That it's meeting our quality. Why because very large organizations including NASA, NVIDIA, PwC and many others are using Dapper and production today. It's powering up the entire cloud infrastructure for FAS for Alibaba cloud Also Microsoft Azure running it as a managed service So it's powering up some pretty critical infrastructure today and we as open source maintainers need to make sure that every line of code It gets committed in the project is very very well tested and that we check for aggressions. And so how do we do that? Well, we start off with unit tests, right? That's the most basic form of testing that we can have and you know One of the things that I like the most and of course from being very cynical here Just want to make that clear is you know discussions like oh, we should have 80% of code tested or 75% And you come up with a magic number that's basically like oh, we should have you know, 95% some people the you know Very brave people are like oh, we should have all of our code unit tested literally every function That's very difficult to achieve And so there's additional forms of testing that you know can make up for these very unrealistic Expectations to test a percent of what your code and we're going to talk about that But really you know if we're looking at the entire pipeline for you know Build your code test your code then provide the infrastructure that's being tested by this code It gets deployed and then you run into issues because obviously there's one thing the one thing You didn't test for a broken production and then you're going to write tests for it retroactively Which is you know an unfortunate reality of life like you know the earth being not flat for example Which leads to time zones issues, you know, especially with remote work. Well, I'm digesting here Writing tests in with different granularity. This one is a doozy because once you start talking about unit tests You know, you're suddenly like oh, I need to spin up a web server. And how do I do that? And so we introduce end-to-end tests, which are at the top of the pyramid if unit tests You really test, you know, the most granular type of testing then end-to-end tests are literally what they say They are they test your system end-to-end and Arthur I think you can talk about that Yes, thank you Adam for the great introduction and here's the clicker. You're probably gonna need it We're not mind-sinked yet. We're almost there. So, yeah, so we started with end-to-end tests And in this presentation, we're gonna build together this pyramid you're gonna see all type of layers that we have in our test infrastructure for dapper and We expect that you can take some of those also for your project And you're gonna notice that like yada mention Really your test coverage for unit tests does not really mean a whole lot because what we're really looking for is coverage of scenarios What are the scenarios that a user are using that will actually be tested? For example, I using testing with Kubernetes with kind with version communities are using so so many combinations I can go in the end-to-end tests. So let's start with how we started that route the journey first We deploy community's cluster We have the dapper control play installed We have dapper with and an application a few applications running We have a database because we want to implement the dapper building blocks. So let's say state store So in a database for that you need a broker because you might be using you might be testing pubs up So and then we go here above and we have a test code the test code runs on GitHub actions But it can use whatever that goes you want and that will exercise multiple scenarios in this completely solution of dapper on top of communities, but just communities not enough So we run this on kind and on Azure and kind here comes really handy And that's one thing I highly recommend which is it allow us to test the end-to-end tests at the pull request We don't need to actually provision anything on Azure or any other cloud that we can add here because count kind Allow us to like you get a shorter term feedback a shorter feedback from the pull request So we have a pull request in dapper. We make sure that the end-to-end test pass even before we merge So you want to have that was a good the very basic Foundation that we started testing dapper and I will just say GitHub action runners do not like kind. They're like go away every time they say kind cause it's like I'm gonna do everything I can to destroy you. I'm gonna mess up the network I'm gonna you know screw up the file system everything can go wrong can go wrong, but in the end it's a really really good thing So yes, I do recommend putting kind in itself a gate of actions because it's much cheaper than you know Running the entire thing in a cloud which might banquet you exactly so thanks yada for for as so as you can see Going back to the this slide That means that the upper you are in the pyramid More integration it's lower, but also flake here So we cannot just put everything on the entrance test part So next step performance test one of yeah, I'll talk about performance test because I implemented them in dapper I'm really proud of them So obviously, you know one test scenarios want to test the certain actions of the or certain aspects of the code Unbroken, but dapper also really fronts your interactions with the underlying infrastructure You use it to connect to your database and your pub sob and to make them more secure But we really want to make sure the dapper is as performant as possible and that new code gets checked in Doesn't actually Cause regressions in terms of performance So we have a performance test and these look kind of familiar or similar to the ones in the end to end test And what we're doing is we have applications. They're talking to the dapper apis their saving state And we run something called a baseline As opposed to the dapper test So we have a baseline test which is the application goes to something like redis directly Then we have the same app goes go to redis through dapper to basically be able to measure the added latency and throughput That dapper is adding or degrading to for that specific scenario And we are using 40o which is a project that is doing performance test for istio And we're moving that over to k6s, which is a load testing tool on kubernetes Highly recommend using that one the reason we moved from 40o Which is used by istio to k6x is because 40o doesn't have really good support for g rpc All the way through so the hdp is really good with 40o g rpc less of an experience so we're moving everything to k6x and That allows us really to test the different dapper building blocks and to end and see how much latency and Or throughput dapper adds to each one of these scenarios So in terms of the replacing the stack we're placing them right underneath end-to-end tests Yeah, we'll also mention that for 40o We also contributed to a feature as our work on dapper If you go to 40o, you can generate a dynamically url with your uid That was a feature contributed by us as part of our work with them Yes, we like to contribute features to many frameworks that we use in dapper and make them better for our use cases All right, and then we have long-haul tests Okay, okay so long-haul tests it's We notice that end-to-end tests and performance tests are not enough Again go back to scenarios. Why because you run your tests. They might run for an hour. Maybe two Ideally less than that And then the end and the stop but you don't really catch Problems where the customer actually is running for a long time. So long-haul means just running for a long period of time So what we do is we created an Simulated application using as many features of dapper as we could And emitting messages publishing messages Using binding using actors Using state store And that application is deployed on a real cluster Kubernetes cluster steam deployments end-to-end tests But it keeps running for like one day two days And we collect metrics throughout the period of this execution And we look at the regressions. For example, do we have a memory leak? Do we have a go routine leak? And these things don't you cannot it's very hard to catch those on end-to-end tests Because it might take a while until you notice the leak. It might not be right away It might take like hours for that to show up And these things of course don't the pull request does not get blocked by this. Of course, it is unsustainable We have this as part of our release processing dapper Well, we have put a release candidate into the long-haul environment But to reduce the feedback time, we also have a daily long-haul environment where every day we deploy A build from master for the master branch into a different environment And we assess at least on a weekly basis if there was any regression On these numbers you're basically here simulating a customer using your application your your platform or your framework in production Anything much add yes, we are tracking metrics through prometheus endpoints of the dapper sidecars as well as the dapper control plane And we're looking at things that are very low level. It's not like, you know, test case scenarios We're not looking at response error codes for, you know, HD peerager PCN available or anything like that Looking at CPU memory number of go routines file descriptors on the file system Things that really show up when you're running a process at scale for a very long time That is true. And one thing I'll let you add halfway through here is that You can see why reusability is a it brings far more benefits to just Moving light of code from the application to the sidecar Because if you're building all of this yourself all the connectivity to your broker Or a state store the abstractions you have to set up a lot of this test yourself So when you're reusing the dapper sidecar and only we're using the features You also we're using all the tests for structure that we set up and to guarantee that quality So how do you guarantee cost behavior cross components? Yep. I'll take that one Dapper has a concept of components. That's what connects the dapper runtime into the different clouds You know, for example, there is a pub sub component for Kafka or pub sub component for rabid mq or a state store component for My sequel and as I mentioned before we have over 150 of these And they have their own interfaces because they interact with the dapper runtime So how do we make sure that each one of these cloud implementations adhere to the behavior? We want to see them exhibit in dapper We have for example here and an application that is trying to get state from dapper And there is two different databases here, but they report key not found Differently one might return an empty result while the other one might return it as an error So we don't want the user who's calling into the api to subscribe abstracting these two different state stores to get two different Examples or two different responses. And so dapper really provides a single consistent behavior But we need to test these underlying implementations and components to make sure that We know that they behave according or that their implementations in dapper behave according to what we need them to do And so we have current form and tests Here and these are really type of contract tests. You can think about them. So we have the state stores We have the pub sub. We're not running any form of a Kubernetes clusters here. This is the nice thing We have a completely separate repository outside of the dapper runtime that just has the implementation for these You know pulsar for example. And so anytime we add a new Edition or feature to pulsar for example We add a new test that basically verifies that pulsar still behaves according to the same interface that dapper Should adhere by and so we have this nice animation here done by Arter You're going to have to teach me that to do that. I yes And so, you know run MongoDB and Then we're on Kafka and we basically iterate over each of these And we basically make sure that each one of them adheres to the Same conforming tests and these run also in GitHub actions. We're not running it in kubernetes because this is go code So we have a framework that we've written that basically Kind of like mimics dapper. It's not actually dapper that's calling into these interfaces Because these are individual implementations. You can even build them into their own binaries and run them if you want to But we make sure that we test them and these conforming tests are really quick I think a whole suite of 150 tests today finishes in under 15 minutes. Am I lying to these fine people or I don't know. I have yeah, I I haven't looked at them in a while because they run so well You know, I don't have to look and how long it takes them to pass But it's something like 15 minutes for the whole 100 150 And so components conforming tests we put Above unit tests because they are still in code. They're not running in an actual compute environment They're testing for more things than just individual functions. They're testing a component as a whole But you know, they're not running inside of you know in compute environments like kubernetes So they are still in the lower part of the pyramid here Yes, and but is that enough art or answer that question Well, apparently never is enough So what you can see is what we notice is that some people were trying to use the upper components Let's say kafka or mqtt and the question was well It's can I rely on that? Because you test it with the conformist test but haven't test as necessary end-to-end test Um, and we also notice that there's some behavior that's specific to each component For example, each component might have their own unique ways to provide authentication With all those authentication solutions work for that component And also they have different material levels. Some of them have might just have started Implementation of the dapple interface. Some of them might be might be have battle tested even using production some cases So to fix like this chicken and egg problem like some people want to use a component But only if it's stable and we were waiting for people to use it to call it stable We decided to solve that with more tests Uh, surprise surprise So this is a different type of test We call certification certification tests Which is the last thing that guarantees quality for a given component And in this case, it's a different approach We have first we started with a test plan Has anyone here written a test plan before? Of course a few people have okay good. So you're basically planning your test. That's what it does Um, and in this case the a maintainer will Contribute will submit a Test plan maintainer will review it to see if the scope of the test is sufficient to guarantee quality And the certificates test are per a written customized per component using the dapper apis So we actually load a whole an entire sidecar as a library in in the in the test framework We have and test each component one by one And this process also runs as part of the pull requests In some cases if you if in the case of kafka, for example, it will hit against a particular container If we talk about either in the service, for example, we use local stack And before you merge after you merge we run that against a real aws endpoint So this is a way for us to guarantee that there is quality against a given component If there's any regression or any new bug found Either you usually becomes a new test scenario in the certificate test or it can be Um a missed scenario back in the conformance test that we saw before The advantage is that once you change your conformance test We haven't talked about that you end up having to update multiple components because you're basically adding a new test To the contract that every component adheres to So you might only fix one component by multiple implementations And let's go back to the matrix where like we have dapper with multiple components multiple as the case It's kind of a complex Mix of scenarios should be tested. So it's basically never is enough So we put his right above conformance test because again, it's using with the dapper sidecar as a library So that's more integration And that gave us this uh was a top very requested feature in Processing dapper where people say can I use this particular component? And you can contribute with a component it's up with alpha Which means you don't guarantee even that it behaves that's supposed to behave But it's a way for you to get started You can quickly add it to the conformance tests and they give you beta status And then once you add a verification test and it's approved by a maintainer it becomes stable And automatically you also get a guarantee of no breaking change So there's no breaking change between uh version upgrades in dapper There's a deprecation path, of course one wants to remove things But it don't have you don't have it doesn't happen within a single version upgrade. Do you want to add anything? No, okay Luckily, uh, but still found bugs. I want to rush up a little more because of the time But we have more to mention We had integration tests as the case So i'm i'm the author and maintainer of the javasdk And we added something called integration tests in this case where You write the test for the javasdk, which means the test is also written in java You have java apps running using the sidecar But you don't run a kubernetes anymore. You're running standalone mode So if uh, so this last integration that I enter and test it's faster to run And what we noticed multiple times that as testing from the client perspective because imagine the SDK test that you're testing from the client's perspective We ended up catching bugs that were not caught in the runtime So the javasdk ended up having additional scenario coverages to the end to end test So one more layer of testing to protect against bugs, uh to our users And all the SDKs also end up having um integration tests in the future So we put it right above the sequential test because you exercise components You exercise the sidecar running standalone mode, but it's not quite as integrated as entering tests, which is a complete vertical stack What about cli? Well, cli had its own test pyramid and we don't have time to talk about cli today. So that will be probably another talk What about quick starting tutorials? Maybe that's one of the things that uh, we can also take from this talk Uh, we have automation for that as well Because imagine releasing the upper and have to retest all the quick starts and and gain started guides manually Uh, I'm pretty sure who here has any kind of guidance started on manual instructions in the markdown No people use markdown. Okay, so people use markdown great We have automated tests for that on the new tool that uh, relative to that we build as part of the working dapper is unrelated to dapper You can use that without using a project at all. It's called mechanical markdown Uh, you basically automate tests for markdown documents with shell commands So if you want to get started of automating your instructions or your commands You can even have a quick end to end test just with markdown file Here's how it works You it's written in python. So you just start with peep install or peep 3 depends on how you use your environment setup You run with mm.py and the markdown file In your markdown file, you're going to have bash commands in this case Just do hello word, but you can have things more complex than that And you annotate with this comment comment here call step and it put the name and it put what is the expectations There are more configurations where we cannot go deep too much into this today And then you end the annotation here and what it does is mechanical markdown will understand this Unnotation and will execute this command and assert the output is the same as you have put So there's a quick way to automate documentation testing in your project So if even if you don't want to use dapper, we don't know about the opera is we couldn't mark down is a core part of our project And we think all the open source projects even private Projects can also benefit from it And this is an example how it looks you can validate the csharp grid starts the go grid starts the java the javascript the python Imagine all of that multiplied by all the building blocks that opera has It will be impossible to test manually on every release. So this gives us a really good automation And it's one of our most stable steps right now because there's there's some flakiness as well to be tackled Um, and how does it look like? The markdown looks the same There's no difference. So when you look at the markdown, you absolutely look at don't see any difference The these are all the commands that the the mechanic markdown will execute Just as the your user will use we run on their terminal And then we run dapper. It doesn't need to be dapper. It's just an example And then the output it can even assert complex log outputs Even if things are out of order a mechanic markdown can make assertions for outputs out of order. It's just a different flag And how it works There's a dapper cli this case apple sly and enters the equation Mechanic markdown. We will read the we read the file is it could all the commands on the cli and the cli will run those Against the dapper control plane and the apps that are running. It can be java can be javascript And in this case run the standalone mode We don't test this necessarily on Kubernetes, but it can it's just a matter of your choice So we put exact calling example tests right above sdk integration It's a deeper level of integration, but not as much as as entrant tests So and what is to talk about the ice cream cone? Yes, I love ice cream. So I kind of hijacked that slide from you Entrant tests are slow. They take an hour to an hour and a half today to run in dapper They consume a lot of power and energy, of course, and you know dapper is a fast-paced project We have more than 3,000 individual contributors. We have multiple prs coming in every day It's a very thriving project and we need to make sure that tests do not become a bottleneck So end-to-end tests once they become too big, you know, the more features you add the longer they will take because you're testing for more Scenarios, so we've decided to add more tests to solve that we're fighting tests with tests, ladies and gentlemen Yes, um, and so we have integration tests and integration tests allow us to automate the sidecar process Outside of Kubernetes Kubernetes is the slow thing. Is anyone here surprised? Probably not So we can actually run something very similar to the end-to-end test infrastructure Just by automating the sidecar itself from within a github actions runner But that doesn't have to be a github actions runner. It can be literally anything If you want to run your own cacd stack, for example And so we are now adding more and more integration tests Because they are faster and they're less flaky than end-to-end tests So in many respects, they're actually more reliable and the logs output They give us are sometimes much more accurate than the log outputs. We're getting from an end-to-end test that is failing And so this is the end of the pyramid where we show you the different trade-offs between integration and isolation But we think we've covered pretty much everything We would love it if someone told us now what kind of tests we didn't add Because we love to add tests. So yes, if there is any form of test you think we should be adding Please come and talk to us. We will probably do that Let's talk about refactoring not really necessarily, you know related to testing but Refactoring code is a major pain and we we don't like to do that For dapper specifically sometimes some would come with a pr and they would go like oh I refactored 2000 lines of code and then we maintainer sometimes become the you know The bad cops because we're like well, you know what that's that might actually reduce the stability of the project And then people might get offended sometimes And so we need to have very clear and established rules about when do we do refactoring when do you communicate those? How do you communicate those so that there is a clear setting of expectation about what the refactoring is doing? You know sometimes developers like to refactor things just because they can and so we can't allow that to happen in dapper And yes, so there is this list of do's and don'ts you can read that here I don't think we have enough time to cover everything Arthur. Is there anything you that's important for you to say for this list? Yes, yeah, don't mix change Intentional change with refactor So if you're the refactors who have no behavior change in your code If you mix that it becomes a mess because now you don't know if it's a regression over is intended change Yeah, it's it's the greatest temptation, right? You're working on a feature But then you notice like you notice this small piece of code that you're like, oh, I can make that better It has nothing to do with my change, but I I can make that algorithm so much better. I can improve the the time space complexity I'm just going to do that and push that with my code and that's all just going to be fine Then you wake up at 2 a.m And you find out that basically when you've decided to make that change you've signed a deal with the devil The devil was yourself So we want to avoid that very much And yes, you need to be intentional. Do you want to do a refactor? That's great Do the feature change you wanted to do then issue another pr talk to maintainer ask if that's okay You know doesn't have to be a maintainer if if you're taking this approach in your own environment Talk to the the team leader or the developer who last worked on the code or the developer who was you know on the hook to Maintain that code long term and discuss the refactor effectorings are dangerous. They should have really good reasons Yeah, all right. No flaky tests order So flaky test is an ongoing effort. It doesn't stop and they actually became as you saw we had so many tests Scenarios that in test levels that flakiness became a part of our day-to-day triple training If it was still not perfect, so there's a lot of Long road ahead of us to get that but you with a stable test suite Uh But I have some some things that people might remember but still good call out. Don't do a yo unit test So if your unit has to do servicing vocation right to this read read from this Um, it's not a unit test. It's integration test at least So make sure you don't do any ill unit test just work And then also don't rerun fail test until they pass without looking at the test failures. Why? Some bugs especially in a distributed runtime are not deterministic Which means that might be some risk condition that you might not be handling correctly And just by running that you hide the risk condition thinking is a test flaky problem Test flakiness problem Another thing is possible run unit test and the integration test multiple times before more gprs Some ideas actually have this feature. You can write it can configure the IDE to run the same test 10 times So try to run your same unit test 10 times in a row. See how many times it passes If it's 100 you have a higher chance of that not being flaky And uh It's also interesting to build reports of flaky test scenarios because Sometimes you know most systems only one test failure is enough to fail the whole run So you call you're seeing like 50 70 80 test failure, uh rate in your test suite But it might be caused by only one of your tests So don't be scared that your test is like super flaky. It might be just one or two cupboards And look at good first issues to help the community if you're dealing with open source project And um takeaways. Yep. So the takeaways from today's talk Have contract tests to guarantee consistent behavior across clouds So in a multi cloud runtime the behavior might not be consistent across clouds So you need to have those those type of tests Um run your n2 n tests in multiple clouds Uh, that is one thing it's ongoing for us where we want to take the n2 n test and run on aws as well We don't we want you to be on kind with your local host and we also run on aw on azure But that's mostly a cost thing. Uh, as soon as we get more credits when I put on azure as well and aws Sorry and aws in in gcp Have long wait, we need to plug ourselves in here if there's any amazon or aws people in the crowd Please give us credits. That's it. Okay We accept. Yeah, that's a great crowd help open source And then yeah, so the reverse head of tests know your test scenarios before we factor in writing Avoid the ice cream cone problem, which when you have too many tests on the top layer You want to have more tests at the bottom there in the top and Automated your guidance started your documentation with mechanical markdown And get claim your supporter icon if you have that you can scan cure code Claim for dapper supporter claim that and you can give us you back here. Thank you very much. Thank you very much Any questions? Thanks for the talk guys amazing stuff too much testing to be honest Thank you. Um, just curious how you analyze results of long-haul tests Do you also have to make the analysis of the results because it's very tricky one? Take that no, okay, so long-haul tests, um It can be automated to look at the results, especially the obvious problems like a CPU memory and number of goroutines We did look at bringing some of the scenario tests for long-haul Lower into the pyramid where we run like a simple sidecar for like 10 minutes to detect obvious goroutine leak But you can automate some of that for matrix and alerts So if you have alerts for your system you can plug that in Into a long-haul test and you can look at the alerts that get fired And that will be a real close to an automation like you should see like zero alerts We depend on the alerts you have so having alerts and monitoring We can help you also automate the long-haul test results that yeah But rest of the stuff you just review manually periodically right now We do manually because just a few tests a few spots we look at the graph and say, okay It's good or bad, but you cannot automate as well It's a possibility to automate every release we have a long-haul test lead and that person basically looks at the long-haul test They look at the matrix and they analyze it. Yeah Thank you. And last but not least You mentioned what type of testing is missing. It's a chaos testing. You didn't do House the chaos chaos test. Oh chaos test. Yes. Actually, that's right. Um, yes Yep, we have that action on a to-do list and then we have an issue for that And nobody picked it up yet. But yes, we don't have chaos testing. Yes. Thank you for bringing it up And that's really nice again Yeah, great questions Okay, all right anything else No, looks like we're good. So thank you again. Thank you