 Well, hello and welcome to another Dev nation tech talk We actually have another excellent presentation to give you today And I just was a few minutes late because I had a cable modem challenge You guys know what I'm talking about here when the internet won't connect. What can you do? So in this case, we're gonna be talking now to Alex Soto Alex Soto is part of our developer experience team He is the fellow coming out of Spain right now traveling all over Europe and all across the globe Talking to developers about how build better software and today's gonna talk to us about how to do testing Specifically leveraging something very cool and a Kubernetes open ship standpoint You guys are gonna see it here Presented today and talk about end-to-end testing and testing in production. So Alex at this point Let me not take up any more time, but take it away. Thank you I'm going to start sharing my screen Thanks for for this introduction today. We're going to talk about testing in production So thank you very much for joining to these the app nation session And we're going to just see you know an introduction of testing in production because you know, it's a really big topic But you'll see just say the basics you will learn the basics of testing in production And also I will give you some hints so you can start implemented it after this talk My name is Alex Soto. I work in Red Hat and the cool photo testing Java micro synthesis book and I was also in the Outdoor of several articles in magazines also and also cheat sheets One of the things that I want to tell you before I start is that probably some of you will say This thing that you're explaining me is a stupid or it has no sense Or my manager will never allow me to do this or even I Think that it's not the right way of testing applications and that's right It's it's normal that you think this but If you start doing tests for micro services architecture and when I say micro services architecture, I'm not saying 10 or 15 services. I'm thinking about 100 120 services. Then it's the right time to Re-watch this presentation So this presentation is about testing and DevOps and one of the great resources out there is in this and in the puppet side Well, they every year publishes this white paper about the state of DevOps, right? And they and a one really interesting quote about DevOps is that high performer organization are Decisively all performing day-low or performing peers. So if you adopt the box, you're going to be able to upper form Your competitors, but let's see some numbers For example, you're going to be able to do 46 more deployments than your competitors and this if you think about it Definitely means that you're going to be able to hit the market 46 more times than them Also, you're going to Have shorter lead times in fact 440 Shorter lead times rather than your competitors and lead time is the time that is measured from you have an idea And this idea, it's available for your client or your customers So it's the process of thinking about it designing Developing testing on finally deploying and releasing to your public So if you have this time shorter, it means that you are going to be able to release These ideas faster than your competitors and of course errors happens because it's natural and if you Embrace developed then your your time to recover is going to be 96 time faster So this fundamentally what is happening today? In most of the companies is that developers are not responsible of what's happening after the checking So the life cycle for a developer is that they just go to the issue tracker They get it some issue. They you know, they create a branch with it They fix the problem. They write a test. They try it locally and then they push the code After that, they wait until the CI system for example Jenkins Says that everything is green so that all the test passes passes that everything is fine and then someone will review the change and Squash and merge into the master branch, right and That's all they do not care about how this piece of software is going to be deployed to production Because at the end this is something that you know operation guide will do some time Probably at 5 p.m. So we need to break this wall of confusion. We need to work together hand-by-hand Debs and apps and this means that if operations team stay all the weekend trying to deploy At a new service to production developers should be there as well And of course this means that there is a change basically the change of the definition that we have of done usually from the developer point of view then it's Something it's done when you close the issue on the issue tracker So for example when I tested and I know that it's that works and this changes Merge to master then as a developer point of view this is done. The problem is that this is not True if you want to do DevOps something It's done when it's deployed and released to production because we need to think about something really important Your code has no business value until it's deployed So you can discuss about you know frameworks to use if you are willing to do this with Quarkus of our spring boot or vertex or any technology or we can discuss about patterns You need to use this pattern of this other pattern or even we can discuss about clean code If this if we should do our code more clean on but We can discuss more and more and more but you need to always keep in mind that Until the the server is Deploy it's not deployed. We are not giving value to our business But of course it's not about that and now it's the whole organization that need to have you know embrace develop and and Have a common ground regarding the box. It's about ops It's about the PMO the the security team the DBAs the QA Everyone should be involved from the beginning of the project in the task so this means that Testing is not an end part of the cycle Usually if you think about how you're working is like that you develop and then you move the sticker from the board from Developing to testing and then someone test these and then this is a sticker is moved to operations and these operations deploy The the new service, right? This is not how it should be Things should be done from the beginning the testing game should be hand-by-hand with the developer checking how The service is going to be test So this is really important that DevOps means not that an ops about all the organization But of course, I mean after that you're going to start using all these continuous integration continuous delivery continuous deployment continuous operation all these automatic things right to be able to reach production Faster because you know if you automate everything then everything goes pretty fast the problem is that You're you can apply all these techniques, but still your time to reach production is too high And the thing is that why it's happening this and the problem user eats in the QA Organization organization sees testing as an expensive task that slows everything when you check Why a project is never on time? You always end up by identifying that okay the QA department is spending a lot of time testing the application and Why is happening this? Well, basically it's happening because the organization requires to verify This new service these new service manually and once we need to Believe it manually is the QA department. So of course Refining something manually takes a lot of time But it's not only this if you have a process where the QA is doing some kind of manual testing developers involuntarily will think that Someone is going to test my Work after me So I don't need to you know write extensive tests maybe because you know, there will be someone that manner will check. So QA department is going to be in church of verifying the software. It's going to be in church of catching technological bugs But not only this probably they will catch some bugs. They will go to the depth and say, hey look, this is failing depth is going to fix this problem and then the QA the QA guy will be responsible of Checking that it has been fixed correctly. So now QA team is doing two things verifying software catching issues and verifying that the issues has been fixed correctly But recently because of infrastructure as code Everything is really easy to replicate every environment is replicable really fast. You can have production and you have QA just, you know running the same um a script The problem is that since it's really easy companies start Um populating all these environments across the organization So you have the production environment and then you have the QA environment The problem with the QA environment is that this is where you're going to run your manual test, but You need to maintain this environment. You need to update it when production environment isn't updated You need to check when it fails what's going on and how to fix it or even you need to maintain The data sets that are inside the database and usually all these tasks Since it's a QA environment falls into the QA team so As you can see the QA team now is even doing another thing which is maintaining the QA environment. So another task Even if you think about not in the QA team, but in the in the point of view of organization having a QA environment, it's really expensive To run and maintain because at the end you still need machines or let's call it cloud machines Running this QA environment. So it's expensive to run and expensive to maintain because the QA um department is in charge of doing it But in any case I've been in several companies, which you know, we heavily relay on QA environments and manual testing So we did all this process. It took Some time to reach production, but at the end we arrived at two production So, you know developers write unit tests component tests contract tests QA departments do a lot of manual testing We deploy to staging everything works So we go to production and everybody is really happy until well We break production And we can break production because production always has Yeah, some uniqueness. For example, the network setup. I mean the dns the firewalls the database schemas probably are 90 99% the same but it can be some changes or even for example the actual weight of data It's really bigger than in comparison of The QA environment So this is where we can think about that. How about removing QA environment? I don't want QA environment. I don't want preproduction environment I don't want to run end-to-end tests manually The only thing that I want is production environment Of course, just having production environment requires a change on mindset and a particular for risk and also changing the way we develop our services because now When we develop a service, we are always thinking that this service is going to work right, but now what we need to do is start developing thinking that all service is going to fail So instead of developing for success, we need to develop for failure Of course, when we are talking about testing in production We are relating on an another concept, which is really important In the past, we can say that deploy And release was more or less the same when we say that we are going to deploy to production We were just thinking that we're going to take the service put it on the production cluster And Our clients are going to start reaching it And when we say we are releasing our services, it's exactly the same But in microservices that deploy and release is a really, really different steps One thing is the plot, which means taking our service and putting inside the production cluster Another thing is the release space. The release space is changing something in the plastic tool to start sending public traffic into this service So now we have deployed and we have released and it's totally different So probably the testing environment has changed it a bit Now we don't have this unit test component test and end to end test, but we have something like that We have the blue column, which is a production column Where is all the tests that are written by usually developers and they are run In Jenkins for every full request So it's unit test component test, contract test, acceptance test, smoke test and so on But then there is three columns, which is the testing and production column Where it's divided into three stages. The first one is deploy So it's when we deploy our service to production Which test I need to run. Okay. You need to run integration test This is something that maybe it shocks you because why you should run interaction testing in against production environment And the answer is that you need to do it because if not, they don't have Any sense to do it because why you want to run integration Test against an artificial environment. It's totally pointless Then we have that compare lot there showing test and conflict test Then after deploy we said that we release our service in the release space We need to use some of these testing techniques canary that canaries monitoring future flight and exception tracking or future graduation But then with microservices and testing in production There is another place and is the pause release usually when we release our services We said, okay, we've been it has been released That's that's okay. We are not going to test it anymore with testing in production You still need to test in this space using tame profiling logs couch testing monitoring a testing tracing and so on So as you can see then now the the testing is across all the life cycle of the software Of course, you don't need to start at once applying all these testing techniques You're going to start with real for with the the simple ones and a step by a step and increasing your Your um, confidentially with this Of course, none of these things explain it here easy And often requires a fundamental change in the way systems are designed developer tested deploy it and released So, of course, it's a big big change to um embrace testing in production Let's see some of these techniques that I showed you previously And this in this case is a technique tap compare which happens in the deploy phase So when we deploy our service to production, but no public traffic is sent and in this case That compares it's used for Detecting progressions. For example, we have service a version one. We have service a version two and we want to The that if it has been introduced any regression and the regression can be a performance regression It can be a back regression. It can be an output regression Maybe the output from both services has changed So tap compare is a kind of test that allows you to detect these regressions before going public There are two tools for doing that one is differential another one is uh Up and ify the thing about differential is that it's implemented in go, but it's integrated with java So it's very fast um It's given it strongly so you can run differential support And it integrates fairly well with from a few and this means that all these Regressions can be monitored by your Monitoring system in this case from a few but of course it also supports other systems And just in in for the schema It's was it how it worked is that differential? It's a proxy but where you are sending a request and then this request is sent to service a b1 and service a b2 Then both services returns a response and then you compare and then you send the status to the monitoring system and you send the result Okay, another test are the Release part in this release part. There are a lot of tests one of those are blueberry deployment Blue green deployments are based on that you have two identical environments And you are just switching from one environment to another environment in for example in the blue environment You set the service a version one and in the green environment you set Service a version two and It was something like this where you have all your traffic going to the service a b1 and then you deploy into the production service And then finally you re-root all the traffic to the service v2 After the time you need to you know monitor for any failure Which you know You can detect any problem if you detect a problem then you can really quick switch from v2 to v1 So it's really fast to recover from a failure Of course, this technique is really great, but it has some problems and is that it's a All or none problem. So all or nothing, right? So it's all the users work or all the users will receive the failure To avoid this problem. There is another technique which is called canary release And canary release is similar to blue green, but it behaves differently in the sense that you Deploy the new service. Okay, and then you start sending public traffic too But instead of sending all the public traffic You start sending just one person of the traffic or five person of the traffic If you monitor and you see that everything works as expected You can move forward to the 10 percent 25 percent 50 percent 100 percent. So instead of Doing a all or nothing approach where all the users could fail Now you are just, you know, selecting and a small portion of your traffic to check If the new service works or not Of course this works in most of the cases, but if you have a Heavy law traffic and I'm thinking for example like facebook that had thousands of requests For second then one person of the user are a lot of users for this reason Companies like google amazon or facebook created One set of techniques to test on production without the end user noticing it and we just call dark lunches And you can think about that lunches like an umbrella of ways to test or services on production I'm going to show you two different Approaches the first one is dark canaries Which basically are based on the next thing is that you have version one where all the public traffic goes there and then You have a subset of well-known users. Maybe it's internal users Maybe our client that you have and a special agreement with them. So they receive this update Before the rest of the public users. So these internal users which could be the QA department or Some clients that you have a confidence with them. They're going to start receiving This updates sooner than the rest and they will help you to test them Another another technique is shadowing traffic Shadowing traffic is based on mirroring the traffic to both services at the same time But notice that in version b1 you are sending a response The solution one is is that the blue ring the blue You're sending a request and then you're receiving your response In case of grid, which is the b2 you're just sending a request and that's all So this is what's called a fire and forget request. So you're sending the request and you don't care about the response But the good thing about this Approach is that you are monitoring this new version to detect if there is some kind of failures or not So you are anticipating any Problem that might occur when you for example start sending the public traffic Another technique is future graduation which basically lets The user to choose Which level of confidence have with the system? This is from open chip. I own currently It's called could ready which is from red hat where you have an attack Which is called futures opt-in where you can choose The level of confidence with the system. So depending on the level that you choose here better futures experimental or internal experimental You're going to be redirected to a more metro service Another technique. This is for the post release is 1000 g It's basically what you are doing is inject feathers in your system on purpose to see how it behaves under these circumstances Maybe you are wondering why I should do this and basically is that because you don't choose the moment when the failure happens The moment chooses you you only choose how prepared you are when it does So it's really important to See what's happening when there are problems before they happen. So you know that how the system will react when they really happens, but I like to see that. Please be aware of this because if you are Too confident on challenging or in your system and you start doing as we you know challenging Destroying your system Probably you're going to destroy it for real And then all your clients are not going to be able to reach Your your your system right your cluster because you're just feel all so start with a small experiments Check with the small failures and step by step just increase the kind this kind of experiment So let's see in a demo some of these concepts basically blue green deployment scanner releases and Mirroring traffic or shadowing traffic And in this case, I'm going to use Kubernetes and Istio And Istio is a service mesh and a service mesh is a dedicated infrastructure layer That allows you to modify the network or the communication between services. So for example, you can Create traffic to one service on another service in propose or you can encrypt traffic between services or for example you can Add some resiliency for example automatic retrace in the communications So let's see it in this case. I've recorded because sometimes I have some A performance problem when I'm sharing my screen. So I recorded the demos And this I will show you the link so you can test it for you in this case You can see that I have my customer a preference services and a recommendation And then I have version b1 and version b2. So I have like four microservices there So now, um, what's happened when I you know, I just call All right, get access to the customer In this case, I'm just getting it and you can see that I'm just getting the customer preference recommendation b1 And then I'm getting customer preference recommendation b2 in the second call This is because by default Istio follows a round robbing approach, which you know, you send request to b1 and then request to b2 But just you know, that's fine. But then what I'm going to do now is just send all traffic to b1 So I just create an Istio resource, which is a virtual service and I said, okay now what I want to do is just Send all traffic to b1 and now for every call that I do It always go to version b1 now Let me what's happened when I do QCTR replace and I add the version v2 So I want to send all traffic to redo. Notice that this is a blue green. I'm changing from blue to green Of course, I just apply this Istio resource and then everything goes to v2 if you are you know interested in knowing There is an error and you want to go back to b1 I can just replace again to b1 and then everything is replaced it and as you will see All the code goes again to b1. So it's really fast really easy and if you want to see how this file looks like You can see here that I'm just saying That the recommendation service has a version b1 and I want to send all the traffic Let's go quick to canary if you want to do a canary. It's really easy because the only thing that you need to do is again create a new Or apply a new istio resource in this case. I'm sending 75 percent of traffic to b1 and 25 to v2 Notice that now I'm doing the call and you get it v1s v1s and then um Some b2s and so on it's like 75 35 is a canary release as we explained And you want to see how this works. It just you know, you define the recommendation and I said that the recommendation Service sends the for version b1. I want to send the 75 percent of the traffic But for the 72 I want to send the 25 percent of the traffic and the last one Is that dark or the shadowing traffic? in this sense It's a bit more complicated to show But notice that uh, when I'm doing a call in this case, I'm I'm not applying shadowing traffic yet You see this this is there is a number This is a counter that is incremented for every request that I send notice that the v2 is in the number 13 and v1 is in number 23 Then I can apply another resource Okay, which is the uh recommendation b1 mirror v2 And if you check here and this is the log of the service you can see here that it is sorry I'm just checking the recommendation one b2. I am going inside recommendation and notice that here The last log is the 13. Okay Now I can go up to the again here. I can do the the call And notice that I do call and call and all the time I'm receiving the v1 and the v1 with the 24 and the 25 The counter is incremented and remember that the last counter is a 13 of the v2, but now If I check the logs of v2, you see that there is a 14 and 15 So you can see that this mirroring traffic. It's working and it's working because the request is Received into v2, but it's never get back to the client The client will always receive the v1. So you can see here the request of request and forget approach So we are almost done if you want to learn more about this. This is there are some links here So maybe you are wondering how I can start doing testing and production The first thing that you need to do is having qas and depths working together from the beginning I know that there are some kind of complicated relationship between depths and Testors, but this is something that you need to fix and work always together And a start with a really really simple service preferably and a stainless service Of course depths can still think that Okay, I don't need to write test because you know, there is this Mirroring traffic thing or this tab compare thing that is going to catch my back So I'm going to not spend time on writing unit test or component test because you know, there is another automatic Algorithm that is going to detect them. Now, this is wrong. You still need to write unit test component test Acceptance and so on because you need to have confidence on your Sir on your software and when you have the confidence that everything is going to work as expected Then I start doing testing in production of course the final idea of testing in production is that The manual The manual testing it's going to be done by your clients or by your customers But not by your qa department your qa department has other duties to do But not testing memory for this duty for this task You need to relay on your customers and your clients And that's all if you have any questions or anything you can Pay me on twitter or by mail or in my website or on my youtube channel. Thank you Well, thank you so much for that alex and we are out of time If you could though if you have those links handy drop them into the chat real fast because folks would be In those links and and that way to have access to him here And of course you can follow alex on twitter with the with his twitter You'll see that he'll publish these links and other slides and other demos and other content As available and you should actually take those videos and put them up on youtube and also publish them from that perspective as well Yeah, I will I mean that I could put the videos I will put the videos. I will put the link everything in twitter. I will publish and even the slides I will publish right now. I mean that in even five minutes Yeah, yeah, so just follow him on twitter and then you better check that out I thank you all for your time today and I apologize for our slow start Because we did have a problem a technical problem that I had to work my way through But make sure you follow alex on twitter and I'll just add that link here All right, so there's the twitter link, which is the most important link and that's all we have time for I apologize, but we have to get going. Thank you so much Thank you