 Thank you for coming and it's the last one here. But not the least, I'm Neha. This is Shubham. We are the maintainers of Keploi. Previously, we have been leading platform engineering teams at some Indian startups like Lenskart and Farai, and we've been open source active contributors. In last two days, we have heard about a lot of tools that can help us automate our CI CD pipelines, make them secure more, things like that. But there is one major blocker still to achieve true continuous delivery. So we believe that is testing. Testing, why? Because it is still very use case specific and not technology specific. So by that, I mean that you cannot really write a test like a script which will work for every of your application. And you need to spend a lot of effort and time to actually write high quality automation test suites, sometimes even more than you would spend on developing the application. And even after that, you cannot ensure that there is one metric that will ensure bug-free code or no defects leaking to production. So that's a problem in achieving the autonomous testing cycle in the CD process. What do we all need to automate this testing process? So simple three things. One, to be able to maybe auto-generate these test cases and auto-update these test cases, you might think of chat GPT or some NLM new models. But we all know that they're not that mature enough right now that it is not like a fire and forget kind of environment right now with those. So we do need to spend effort there, maybe generate stubs, et cetera. And coming back to automating, so the third most important thing is creating that test data which is almost equivalent to the production data. We want our test data to be just like production data. So that's all we need to automate this whole testing process. I will just give you a brief about different solutions that we explored to achieve this state and what were the limitations with those. And we thought of starting with testing in production because you wanted our test data to be just like production. So if you want to test in production, you just need to do the shadow testing that your application is running, serving the user traffic, you put the new version of your application, say V2, and try to replicate the same traffic via some service mesh or something, right? And you try to assert the responses of the previous expected as well as the new deployment that you're going to make. So if those matches, voila, it is working fine. But it's not that simple. It works fine with stateless applications. For stateful applications, there are a lot of other dependencies that your application is talking to, especially databases. And you cannot really have the same state for your new application version that you're going to deploy or want to test. So how do you really deal with that? So first we thought of, what comes to mind is, why not just directly connect this application V2 to the production database? But again, that creates a lot of blunder. So definitely we cannot do that. We thought of testing the reads, but not the writes. But yeah, if you have idempotency guaranteed, then you can do this. We did not, so we had a proxy introduced which could easily filter out the writes. And with the reads, it was able to match out the actual and the expected response of both the releases. But it was not enough. We definitely wanted to test mutations. So what we planned was to replicate the database or create an in sync database, a replica of our database in production and connect the version two part of the application with that. Now, when you're replicating that in theory, it makes sense. But in practical, there was a replication lag. So whenever there's a request being played on your production application, as soon as there is a change in the database, it is in sync and synchronization with a duplicated traffic call, that becomes harder. So one, it was a lot of operational effort to set this up. And two, it was the replication lag which could not make it successful. So we thought of doing it not in real time and later on. So we put that same setup in a non-prod environment. What we did was created a snapshot database in a non-production environment and connected the new version of the application to that. Now, when you replay the traffic, again, you need to set up the pipeline operational effort. And that's okay. The problem was that one, at one time, you can test your traffic because the state of the database is the same. But after that, your tests start breaking and there is a lot of flakiness in the tests. So the pipeline becomes brittle. What do we do after that? We thought of recording just these tubs or the query data instead of the whole database. So instead of creating the snapshot and maintaining that in a different environment, we just picked up the query data of that dependency. If it is database, let's say, I'm going to take an example of that in later slides. And created that virtual dependency and packaged it into a test case. That's how we actually created that. It's like a local stack for any kind of infrastructure, not just AWS. So that's how this was, Kploy came into picture. And we were able to successfully replicate the production traffic as well as the quality of data from production to local environments for testing. So taking an example of what I mean by virtual database or virtual dependency. So let's say if you want to replay this request, this is a production environment request. Let's say you want to get games for a user Thompson and it queries the MongoDB database and gives a response. If you try to replay it today in a test environment, definitely the state of the database is different and you won't be able to get the user Thompson, right? So you won't be able to just replay the traffic and get the same response or assert it because of the state of the database. What we did different was that for the same request, when you replay it, the user Thompson and the query data and all the responses from that table, right? Those were, that stub was stored and packaged and was returned when the application new version asked for it. Now, in the same state of the database, how does your application behave? Or the response of the same request is those assertions, exerting the whole response makes it a good test. So that's how we created the packaged database and stub from the real production environment. Now, Shubham is going to take a demo and show you more about it. Hello. So yeah, for the demo, I'm actually going to show the new version of Ketloy. We're still working on it. I mean, we initially built it. We initially had SDKs for different languages and now we have moved to EPVFs for instrumentation. So first, I'll just run a sample application. It's just a simple URL shortener. It's a simple gen application. It has a Mongo database. It has two endpoints for creating a short in URL and then you're getting the original URL back. So while recording the test, I'll actually run the Mongo database. I won't need it while running my test because Ketloy acts as a database proxy and it will just return the right responses. Yeah, in fact, let me also delete the Ketloy test folder. So now the database is up and running. I can run the application. Yeah, so I can see that the PID is 9543. I will copy this because I would need this PID while I'm going to run my EPVF-based instrumentation. So yeah, that's that. So yeah, I provide this. And then yeah, I also need to run the Ketloy server along with the EPVF agent. So when I do this, I can run it in test mode. And yeah, it's going to put the right kernel hooks and yeah, it's initialized. Now I'll just, you know, I can just simple call. For example here, so local is 8080 is the actual server, the test application and I'm going to, you know, give it a URL. So in this case, it's Google.com. Yep, so I get a response back. I can in fact, copy this link. And also, you know, inspect what other thing has happened. Okay, there's some segmentation fault. You can double check the PID, PID looks fine. And then mode record looks fine. Again, I guess try it once again. Or just rebuild the entire application. Perfect. And the application PID is this, then Ketloy, yep. So as you can see, this is captured. I can see it in the Ketloy server. So there was a test file written as well as a mock file, which is basically the stub. I can actually go to the samples repositories. So here, if I go to GenMongo, I can see these stubs that were generated. It has the body, which was Google.com, along with the response. And what's interesting is it also has the communication with the MongoDB. So that's also there. And I can see the opcode. I can see it's an update query. So yeah, that's that. And then maybe I can capture more. What's it to get? So yeah, that's the redirect to Google.com. Perfect. Now I can actually shut down MongoDB while testing. And basically, Ketloy is saying, you know what cannot take to MongoDB because it's off now and set this to test. And yeah, there's a configurable delay. In this case, it is 10 seconds. So Ketloy would automatically run the recorded test against my test applications. As you can see, two tests are running, they've passed. Now I can make any change. And you know, we can get a test report. So for example, I may know maybe URL is renamed to redirect URL. And in fact, I can change anything in the timestamp. So Ketloy also automatically identifies time sensitive fields and ignores it while assertions. So body.ts is already labeled. So I can change anything here and it'll actually not matter. So now when I run test again, yep, we can see one test case has failed. I can go to the Ketloy server to see more details. So yeah, we get that there's a mismatch in the redirect URL. The timestamp field is fine because it's a time sensitive field and it's already been flagged. So yeah, that's a brief demo about Ketloy. So what's interesting is I didn't have to make any change to my application. All the instrumentation is done by APBF. And yeah, we'll be soon releasing the Ketloy V2 which will have this APBF-based instrumentation very soon. So coming back to the presentation. Like I mentioned, we currently have an SDK-based approach. It's quite easy to map requests and dependencies when you have an SDK because we can pass context throughout the application. And it's easy to understand especially while sampling or while deduplicating at scale, it's easy to understand which database query belongs to exactly which incoming request. So mapping that is a lot easier in an SDK. Code level integrations are also easier because since you're providing SDK, it can provide more control to developers and how it's interfaced with the actual test suites. So yeah, we get more control overall. And yeah, because we have access to the application runtime, we can do a lot of additional things. We know a lot of context about the application. So that generally enables us to do way more than what we could do from outside. Like for example, an APBF-based agent. While on the agent side, it's less to no code changes on the host application. So it's easier to integrate that has been rather hard on the SDK side. Faster to deploy and adopt because now you don't have to have code changes done throughout the application and it's also easier to deploy. And low development overhead for maintainers like us because now we don't have to support every language into every driver or library, into every versions. So it's a lot easier supporting it at the network protocol level. So limitations, APBF is Linux native. So on non-linux system is going to run in a virtual machine. Right now, we're working on a Docker-based experience. We also would need to run this as root because to load APBF, you need root permissions. So yeah, that's that. Yeah, and especially running it at scale, deduplication and sampling becomes really important. So far, we have done, I mean, we have done a lot for some of our personal use cases for places that we've worked, but scaling this, we need to have a very effective deduplication system. That's something that we're working on as well. And yeah, since we are recording test cases, so it's enabling us to capture and generate stubs and test cases, but there has to be a domain expert which verifies the results. The Kepler doesn't really know what is correct or not. It just asserts against history. So if something changes, you will know. If it's right or not, that's actually up to you. And yeah, in-memory state. So let's say you have application which has, which responses depend on status inside the application, that cannot be captured because we are capturing network calls. So they are automatically out of scope. So future work, yeah, we'll be releasing initial release very soon. And it's going to support HTTP to start with and popular databases. We'll have support for async components which we did not have till now. The SDK version did not support async because async typically does not affect the response of your tests. Now you can assert on different async operations that your application does. We're also adding support for streaming services. Again, something which can be complex to even test manually. So yeah, we're adding support for that. We're also adding support for auto generating edge cases from API schema. So let's say you have an API schema. We could look at edge cases and then use the Kepler's instrumentation to generate test cases that are relevant for that particular application. Yeah, GRPC. So we do have support for GRPC in the SDKs, but yeah, the new version, it has to be reworked. Yeah, thank you. You can find Kepler and GitHub. You can explore a lot of our samples. We currently have SDKs in Go Java and JavaScript. Thank you. Yeah, we're open to any questions. Yeah, so we currently enable it, like in the current version of Kepler, you can obfuscate certain fields. But if the application depends on those fields, right? Then it'll not work even right now. But like for a lot of our users, what they do is they generally obfuscate things like credentials or personal information and that generally doesn't affect their test output or their application output. So that part works fine. But for cases where it does affect, we are planning to add something that's going to be a little bit dynamic. So it'll be obfuscated to developers, but in actual runtime, actual values get passed and asserted. So that's a more dynamic thing. But yeah, as of now, we do support static obfuscation. So just filter out fields and those get ignored. All right, thanks for the presentation. Do you have any case studies or examples of a team or company that use this completely end-to-end? What was the ultimate value that they got out there? Anything measurable? Yeah, so we have users like, for example, Nutanix. So they are using Kepler to basically generate stuff for their infrastructure, right? So for example, they have their platforms team, which has generated, you know, stuffs for a lot of Kubernetes APIs or their internal APIs. It's also integrated in a bunch of their open-source projects. So that's that. So obfuscating infrastructure, it's saving them cost. It's making their tests more realistic because earlier people were writing mocks and those mocks are, you know, not real. And yeah, that has helped them discover a lot of things. For example, I remember there was a particular unit test that we were trying with and that was actually making 50 API calls to the Kubernetes API server, right? So I don't think we have done a case study with them yet. Right, that's something we plan to do, like a more detailed case study with more absolute numbers. We've also worked with Suzuki India, for example, on a bunch of, you know, a lot of Indian companies where their primary use case was test automation. So that has been a learning with Kepler-V1. So test automation and infrastructure, stopping, this has, they were like bundled together and some people wanted, you know, some part of it, some people wanted the other part of it. So with Kepler-V2, we're kind of, you know, decoupling them. So independent components which can be used either together or independently really well. But yeah, we're a young project. It was open source in March last year and still working on case studies. Thank you. Any other questions? Actually, I was going to ask about, I was, I think that kind of an interest to me about this was the ability for it to be as the, kind of separate from the app. So I was kind of curious about NGINX integration for instance, we're going to be in a test environment using NGINX as a, like we already use it fairly regularly for all of our routing for each of our test environments. Being able to put this in automatically just to grab all the stubs and create our own mock test just dynamically that way. Yeah, I mean, so there are like two aspects, right? One is capturing the input requests that are coming, let's say from users or from outside. For that, yeah, for that, it can work with NGINX as well as long as you don't install the Keploi agent where NGINX is running. So that's something we can do. But typically the harder problem to solve is, like we talked about the infrastructure and underneath those APIs. So there, like with the current version of Keploi, we have to add an SDK. And yeah, I mean, with the new version, you'll have to again install the agent on every machine, on every VM that's running. Does that answer the question, does that? Oh, well, actually, I was just wondering whether you can just do it just based off the request response that only NGINX sees. So you don't have to worry about all the, we have like hundreds of microservice behind the scene. We don't want to do that. But just on the NGINX side itself, whether you just have it read the request responses and form a test cases based off what kind of goes through it through a certain time period or something. Yeah, I think that should work. But yeah, then the responsibility is up to the user on ensuring that, like, let's say you want to run that again. So you capture a bunch of, let's say, one hour of traffic at the NGINX layer. So it has the incoming, the input and output requests. Sorry, the request responses. Now when I'm going to run it again, it's up to the user to ensure that the application is in the same state or the database or the whatever, it's in the same state and the responses are consistent. I mean, we could definitely just record the HTTP request response and run assertions on that and leave that to the user or you can step out the entire infrastructure as well. So it's a choice. So you could just do it at the NGINX layer, record the request responses and then run it at a later time and you know, assert on that. Or you could, you know, do that plus also the database queries, create step of that and basically what that adds is that it's like kind of going back in time, right? So if I'm getting a particular user or getting any information, recording the database queries and responses ensures that the response on the application is also going to be consistent. In some cases, it's not relevant. In some use cases it becomes, you know, I mean, it just won't work without this. So yeah, it's actually up to you, totally up to you. Cool? I think we're good then. Thank you and thanks for coming.