 Hello, everybody, and welcome to this talk titled, When They Go High, We Go Low. My name is Tal Tzvik, and I'm a developer at MetalBear, where we develop Miradee. So this talk is about solving a seemingly high-level problem using quite low-level techniques, specifically the problem of cloud-native developer experience. And more specifically, we will discuss Miradee, which is a DevTool written in Rust that is for developing cloud-native applications. Miradee enables cluster-connected but local execution of cloud-native applications. And what that means and what it looks like will become clear as we move on. So as I said, the high-level challenge Miradee addresses is cloud-native developer experience. And more specifically, just being able to run your code while developing it. And the solution Miradee deploys is at the operating system process level. First, we're going to talk about what Miradee is for. Once that is out of the way, we're going to take a look at some of the low-level technical details of the implementation of Miradee. And then to wrap it all up, there's going to be a short live demo of Miradee. And during this talk, I hope you'll understand why we went in this very low-level direction for Miradee and also how it works in general terms. So first, what Miradee is for. Running the code you're working on while developing is useful and helpful. And with many types of software, there is a simple enough task. But often when it comes to cloud-native applications, that can become not so trivial at all. But I believe that we shouldn't abandon useful development patterns and practices because they're harder to implement with cloud-native applications, which is just develop a tooling to make them easy. But how do you run your cloud-native application while developing the application if it, for example, accesses files that only exist on the cluster? Or if it uses other services in the cluster over IP? Or if you need another service in order to generate requests to your application? And besides, running your code often is good. And you might not want to spend time containerizing and deploying it after every little code change you want to test out. So there are multiple existing approaches to solving that problem. They don't all offer the same capabilities, and they don't all result in the same kind of workflow. But other tools, for example, try to just enable remote debugging in the cloud or there for running everything locally on the development machine. The approach mirror detects is to enable local development with remote access. Now, there are other tools, like telepresence or GIFIRA, that also belong to the remote call category. A term I learned in Daniel Bryant's talk yesterday. So but unlike those other tools, Meridae works on the process level and not on the machine or maybe container level, which results in a different workflow for development. And it also results in some nice content for technical talk. Now, the idea is to have optimal development experience for running the code. And for us, that means really just as simple as running non-cloud code, so as simple as just clicking the Run or Debug button in your IDE or some very simple shell invocation. In the same time, we want developers to be able to run with meaningful data, realistic conditions that are similar to the conditions the code will meet in production, and also just being able to utilize existing environments. So with Meridae, developers can execute their code locally, either in a shell or in their IDE. And when the application accesses resources from the cluster, it gets actual traffic and actual data from the cluster, even though it's actually running directly on the developer's computer. And developers can run their code very easily and do things like stepping through breakpoints, but still take advantage of realistic cloud environments, like maybe some shared staging environment that is maintained in the team, all the while being non-intrusive and not even changing the application that is currently deployed on that cluster. Now, Meridae does that by running the software just in a simple process on the developer's machine and adding a thin layer of transparent virtualization, connecting the application's IEO points to the cluster. And the way Meridae achieves that magical effect is by injecting a dynamic library into the application's local process and hijacking some key operations made by the application, which leads us to the next part, the cool technical details. So here we'll take a look at key steps Meridae takes in order to hijack an application's calls and make them magically succeed even when trying to access resources that don't exist on the system it's running on, but do exist on the Kubernetes cluster the developer is working with. So the first thing we need to do is to get our code into the processes memory. The way we do it is when a user tells Meridae to run their application, Meridae executes that program. But first, it asks the dynamic linker to also load our dynamic library into the new processes memory. You might know it as the LD preload trick on Linux. On Mac OS, the equivalent environment variable is DYLD insert libraries. And Meridae just adds a path, like the path of its dynamic library to that environment variable. And the dynamic linker loads our code into the new process in addition to that process's own code. OK, so now our code is in there, but we need it to run. Just by virtue of being in the process that doesn't mean our code actually runs as the user's application doesn't call it as it isn't and shouldn't be aware of Meridae. So in order to ensure our code runs, we use another feature dynamic linker's offer, which is what is sometimes called a constructor. And that is basically the feature that code that is placed in a section, a specific section of a binary, will be executed on startup. So Meridae just puts some code in that section of the binary, and that code gets executed directly on process start. Now, as mentioned before, Meridae is written in Rust. So that's the syntax we're seeing here right now. And Rust has this nice package that basically lets us just mark this section using this little attribute about our function. So all we need to do in our code, in Meridae's code, is to define such a function, write whatever we want to run on startup in the body of that function, and that will run when the process starts. OK. So now we have code execution. And now there's a question, what do we use that ability for? But before we answer that question, just to make sure we're on the same page, a quick introduction into Alip C. So it's the C startup library. It implements some useful low-level operations that most applications need. It is used indirectly by almost every programming language. So when writing Python code, you might not be aware that that code, when running, will result in calls to libc. But when running, for example, file.read in Python, when that code is executed, at some point there's going to be a call to libc's read function or some other similar libc function. And that is the case for most programming languages, with a notable exception of Go and Linux, which is why we had to do some extra work in order to also support Go and Linux. But that is a bit outside of the scope today. So this indirect call to libc means that most languages don't actually include some programs written in most languages, don't actually include syscalls directly in their code. More typically, they would include calls to libc. And libc, in turn, makes syscalls whenever it's necessary. And that makes libc a very good choke point for us, for hijacking operation that are done by the application. Because basically almost or virtually every application will use libc for those operations. OK, so now that we have code execution in the user's process, what we do run on startup is code that utilizes freedom to hook libc functions. First of all, freedom. That is a dynamic instrumentation toolkit. And that means it gives us an SDK for manipulating the processor's code while it's running. And what I mean by hooking is that we use freedom in order to replace the first couple of binary instructions in the functions code with a jump to our code. So for every such libc function we want to hook, we just define our own replacement function. We call them detours in our Rust code. And we use Frida in order to make sure that whenever the user application calls those libc functions, our detour functions will be executed instead. So what we have to do there is basically to create a Rust function with an identical signature and also notably the C calling convention. And basically, yeah, whenever that libc function is called by the user application, our code will be executed instead. Now, the code that we put in those detours right there isn't run on startup, right? On startup, it's just the Frida code, like our code that uses Frida, that makes those hooks. But whatever we put inside of those functions will only run when the user's application actually makes those calls to libc. So now, what do we do inside of those detours? Is, first of all, we check the configuration or a flex that the user passed to mirad. And then we decide if this operation should be carried out locally on the developers machine or in the cluster. Everything is kind of configurable. So for example, you can tell mirad you want to read some files locally and other files in the cluster. Or the user might want to read everything in the cluster but only write locally, et cetera. So after we made this decision, mirad will send the operation to an agent running in the cluster. Now, that agent is spawned just in time when we start the mirad run. And it is deleted automatically once the run is over. And basically, that agent will get that operation from the locally running dynamic library of mirad. And it will perform that operation in the cloud, return the data or the results back to mirad that will in turn make it available to the unsuspecting user's application. So for example, if a user's application tries to read from a file, mirad will check the configuration if it decides that it should be read in the cluster. It will send a message to the agent in the cluster. The agent would read those bytes from the file system that is available to the deployed application. Return those bytes back to mirad. Mirad will populate the application's buffer with those bytes. And the application basically just doesn't even know it's not running on the cloud because further application is the exact same effect as running in the cloud. Now at this point, you might be wondering how we even access the deployed application's data, its file system traffic, et cetera. So first, we need to know where to get the data from. For that, users can specify a target for mirad so that's either a pod, a deployment, or an argorollout optionally with a specific container. Now the most common use case for mirad is for working on a new version of an existing microservice. And in that case, you're going to have the stable, like the current version, of that application already deployed in the cluster. So that application is what you would typically use as a target for mirad as that's the resource that's the application that has the data that your local application needs. Now, the next thing that is important to know is that our agent runs on a pod on the same node as the target or as one of the pods of the target. It then joins the target's Linux namespaces. Now here it's important to know that those are Linux namespaces that are unrelated to Kubernetes namespaces. And in order to read files from the target's file system, the mirad agent accesses paths relative to the target container's root path. So that was the cool do-it-yourself way to get this data. But ever since ephemeral containers were introduced, we also support just running the mirad agent in an ephemeral container. And we'll probably be moving to make that the default. So since ephemeral containers are designed, are meant to enable debugging a running container, they already get all of the container's data. So they access the same resources. So over there, we don't even have to do any cool tricks or anything especially in order to access that data. So I mentioned resources that mirror the access of the target. And the main ones that we support are networking. So that's both incoming connections and also initiating connections. And also, DNS resolution in the cluster. And then there is file system access and also environment variables. So zooming out a bit, we can see that the main components of mirad, are first the mirad binary installed on the developer's machine, or maybe also a plug-in for their IDE. Then there's the dynamic library that is injected into the user's process. And there is the agent running in the user's cloud. So the dynamic library hijacks the calls by the user's application, sends them to the agent. The agent sends data back to the dynamic library. The dynamic library makes that data available to the user's application. OK, now after we've been to this short journey through some of the technical details of mirad, let's see how it looks like when users use it. And for that, we have this little example setup with some custom microservices written in Go, together with a Redis and a Kafka service, all deployed in our cluster. And we'll go through a scenario of a developer that is working on the IP visit counter microservice. So now let's go to GoLand, work over there, and see if the Wi-Fi will let us work with the cluster. Can you also hear me with this one? No? Can I have this one? I need both hands for typing. Does this work now? That one? OK, that one works. OK, so right now we're in the GoLand IDE, the JetBrains IDE for Go. And I already have the miradee plugin installed on my IDE. So I have this little miradee icon that indicates that miradee is enabled. And let's also take a look at the resources on my cluster and see that we recognize the same ones from that diagram we saw earlier. So just a quick look at this application we're going to debug. Like a couple of functions it has here in this main Go file, there is a setup Redis and setup Kafka in order to set up connections to those services. We have a load config method function that loads a configuration for this application. GetCount is kind of like the main function that is going to handle the requests, incoming requests, HTTP requests. And there is the main function that starts the server. OK, so we got miradee enabled. I have my breakpoint here in the main function. I have some very basic miradee configuration file, which is almost like the default configuration file. And now I'm just going to hit Debug and start running. Now, of course, what's happening right now is that miradee is spawning an agent pod in my cluster so with bad connectivity that might take a bit. But we're already there. So first of all, let's step into that load config function and see what that looks like. So here we can see that the application uses Viper, so a Go package in order to populate the fields of a config object with values it gets from environment variables. Now, I actually don't have those environment variables defined on my machine, also not in this launch configuration. But if we step through and go to the end of this function, we can see that the config was successfully populated with values that came from those environment variables. And the reason for that is that miradee fetched those environment variables from the cluster and made them available to my application here. So if we step out, we can see that now the application uses those values from the environment variables. So the value is in the configuration in order to do the next operations. So first of all, it's going to read a file from a path that was passed in this configuration. And right here, we can do a little verification and see that this path, of course, doesn't actually exist on my system. But when this application reads that path, it actually succeeds and it gets data and, of course, we now know that this data came from the cluster. The application doesn't know it, right? It isn't aware of miradee, but it's just like it's running in the cluster and has access to that file system. OK, the next thing, the application is going to set up a connection to Redis. We're going to step in and we can see that it uses another value from the configuration as the Redis host. Now also there, if I just try to connect to that Redis host directly from my machine without using miradee, Curl is going to tell me that it can't even resolve that host. And that makes sense because that name is internal to the cluster, but the application is able to create the connection so it's able to resolve the host and also to send Redis ping. And that is, of course, because both of those operations happened, were carried out in the cluster, so both DNS resolution and initiating an outgoing connection to that Redis service. OK, moving on, I'm going to skip over that setup Kafka function because it's really probably very similar to that setup Redis function. And the next thing the application does is to define the two routes it's going to handle. So one is for the health checks defined on the cluster and the other is the main one, the actual one that is going to handle. So let's just let the application start and start receiving requests. And if I go over to that console, I can see that we can already see incoming health checks from the cluster. And if we go over to the terminal, we can also generate requests to the cluster, right? So it's not to local host, it's to the actual cluster exposed in the internet. And I'm going to generate that request and we can see that my local breakpoint was hit because we already forwarded that request to my local application. And now I can basically step through this function and see how my application handles a real request that was actually sent to the cluster. Now, if I go back to the terminal again, so this response that Kirill received here was actually not generated by my local application but by the remote request, a remote application. And that is because in its default mode of operation, MirrorD just mirrors the traffic from the cluster to the local application, which means it duplicates that traffic, sends it over to the local essence of MirrorD, and it is actually the remote application that handles those requests. There is also a steel mode where MirrorD steals that application away from the remote application. And for that, there's also HTTP filters in order to steal only some of the requests. For example, you only want to steal requests that you generated and not teammate, or maybe you don't want to steal all of those health check requests because they're not interesting for your debugging. OK, so that's that. I'm going to go back to the presentation. We'll take a second. Nice. OK, so yeah, I think we still have a little bit of time. So let's look at another very low-level challenge we had to overcome in order to create MirrorD. So at one point, I mentioned injecting a dynamic library on Mac OS. But this is actually not at all as simple as it is on Linux. So in order to demonstrate that or short that challenge, this is like the minimal example for injecting code. So we have, on the left side, a very simplistic binary. It's a Rust code that I compiled to a binary called HelloBinary. And in the right side, a code that I compiled into a dynamic library that prints a line on startup. Now, if we just run that binary by its own, this is what it looks like. This is to be expected, right? And once we inject that dynamic library, we see both of those prints. And that is basically like this very simple principle is the same method of injection that MirrorD uses. But now what happens if I try to run some other program, maybe in a non-compiled language like Python? And reminder, this is all only on Mac OS. So what happens when I try to inject that same library into that code? So now that we can see that it doesn't work, right? We only see the print of that original application. And we don't see the print from our dynamic library. So it means that injection failed. Now, the reason this happens is basically because of a security mechanism on Mac OS put in place by Apple, where when a binary meets some set of not very documented criteria, this environment variable is not respected and is also stripped off of any of the descendant processes of this run. Now, over here, what I ran was like a Python script, right? And you wouldn't expect it to be protected by those criteria. And you'd be right. Because this Python script, as an executable right here, is not the actual binary that is executed, right? Because when we have this script, we have to look at the Shabang. That's like the hash exclamation mark in the first line. And this is what determines what binary is going to be executed. In this case, it's just n and n versus Python 3 as an argument. So n will decide what binary to execute next, like depending on that argument. And for many tools, for many interpreters, that doesn't even end that. So for many Python distributions or no distributions, just running a script results in a very long chain of executions. So first, you actually execute n. n of executes some script that then searches for the right installation on your system that then executes n of again. And n of will then find at the end of this point the right binary. So in this case, some Python binary will be found and will be executed. So what happens in this situation is that n is that binary that is protected by Apple or by the mechanism on Mac OS. And this is why injection doesn't work in this case. But of course, we do want to support all languages also on Mac OS. So we did have to find a solution for that. And while the complete solution is quite involved and complicated, and if you're interested, you can see it in our code and GitHub, or also we wrote a blog post about it. But in very general terms, in very simple terms, what we do is we first extract just the binary code out of the executable binary. We then write it into a new file. We sign it on the go, and we basically run that new file instead of the old one. And now you might remember that many files or many runs are actually chains or executions of multiple binaries. So in order to also support that, we hook another libc functions called execve so that we can do all of that process on the go every time a binary executes a new binary. OK, so that was a quick tour through the internal of MirrorD. MirrorD has a lot more interesting implementation details and also cool features that I didn't have time to go into today. But if you're interested, please just check out our code and GitHub or our website. And now you've seen how easy it is to just test out. I hope you'll try it out and let me know what you think. And I also recommend joining our Discord for live MirrorD support and questions or also just to chat. And if you have any questions or want to talk to me personally, please just feel free to come to me after the talk and say hi or anytime. Also, I have some limited amount of MirrorD and Melbert swag with me. If anyone's interesting, just come to me after the talk. So thank you very much for your attention. I think we might have time for a couple of questions. Yeah, so we do have time for a couple of questions. So yeah, if there are any questions, we can do that now. Hi, thanks for the talk, really nice. How does the traffic from the cluster flow into the application running on your laptop? I don't think you went through that part. Right, so when the start runs, MirrorD first spawns the agent in the cluster and then it creates a connection with that agent. Now that connection is basically based on port forwarding into the cluster. So MirrorD creates kind of like a TCP connection directly with the agent that runs in the cloud and then communicates with the agent in its own protocol. Thank you for the talk, very interesting. The question is, that sounds fantastic. Is there any reason why this would not work for specific use cases? MirrorD. So any use cases that wouldn't work? Yes. Yeah, so there are some edge cases. So first of all, the most obvious for a basic one, we do it by hooking libc functions. So if you were to write your own assembly code that directly does syscodes and doesn't include any libc codes, any libc calls, then you couldn't use a review with that. Yeah, I guess we don't even try to support that. There are static binaries. So because the whole thing is like the whole mechanism is built around injecting dynamic libraries, static binaries are not good like that. It's not going to work for static binaries. But I think there are simple enough work rounds for that, like maybe just forcing a dynamic binary built. And I think we might have, if there are other cases, I think we probably already have them documented on our website. And if we don't, we welcome contributions also to the documentation. Hey. Good talk. Hey. Is the mirror this solution for the local machine is bounded by Linux or Mac only or is it also Windows enabled? So for Windows, you can run it over WSL. WSL, so Windows Subsystem for Linux. Yeah, I think we might one day also have kind of more native support in Windows. Yeah, we accept contributions on that issue. I think it's a very popular issue on GitHub. But the one the team is most afraid of, of ever being prioritized. So yeah. Another one? OK, hi. Yeah, I actually wanted to ask the same question. On top of that, do you have any idea to support bare metal or it's only about Kubernetes? Are you asking because of the name or? Oh, no, actually not. OK. I think right now we only support like Kubernetes also maybe like similar solutions like similar distributions, right? But I am not aware of any like plans for bare metal. But feel free to suggest it on GitHub, ask it on Discord. Yo. So Gilep C has versioned symbols. So what happens if I call a symbol that is not available for its version in the pot that you spawn on Kubernetes? What happens in the case? So the libc hooks all like happen on locally. They're not like they don't happen in the cloud at all. They're sorry, did I misinterested the question? That's not my question. In my Gilep C locally, the symbols are versioned. And they can be newer than on a remote host. OK, so basically we implement the operations not like one to one. So the operation that the agent carries out on the cluster is not just run that exact same libc. There is some level of abstraction. So basically, it would still work. Like also it doesn't depend on the libc versions being the same or whatever. I have two questions. Is this particular to the GNU libc or also support muscle or the other libc implementations? Which version of libc? Yeah, just the GNU version or does it also support muscle? Oh, actually I'm not sure. Do you know how? Sorry. Yeah, it should also work with older ones. If it doesn't, please let us know. But it should be supported. We're not aware of any problem with older libc versions. And the traffic that's forwarded between the Kubernetes cluster and your local. Is that encrypted in any kind? Right, yeah. So it's worse over port forwarding. So by the same encryption of the kubectl port forwarding. OK, thank you. Thanks. Cool. OK, we're out of time for today. If you have any more questions or just want to say hi or secure some swag, just let me know.