 נתשוב כמה שנקרמר, using open telemetry for application security purposes. In this presentation we will discuss the evolution of application security and code vulnerabilities, how the shift to cloud native affected code vulnerabilities. We will see some examples and the way we used open telemetry in order to solve these problems that happen because of the shift to cloud native. We have only 35 minutes so let's jump into the details. תודה Athens, Ill start with a short introduction about myself. So my name is Ron Vider. I'm the cofounder and the CEO of Окceu security. In or tenure, we're building application security platform for cloud environment to find vulnerabilities in their custom calls for Kubernetes based applications. I've beencomed in cyber security landscape for over a decade since I learned how to look up websites when I was 16 years old built my field application 的 veteran testing tools, and in my free time, I like to look for new vulnerabilities and do security research. And in the past few years, I'm highly focused on the cloud native landscape. So, a short overview of the agenda for this presentation. Like I said earlier, we'll start with reviewing the evolution of application vulnerabilities or code vulnerabilities. Vulnerability in the code that is written within the organization. We will see some example of vulnerability ‫הגידת אנחנו שגשנו בקigneה פרקג baba, ‫בסוף סיפור פרקג, ‫אנחנו עשות ספר א Knight Short ‫ראות מה שלה פרקג ביניהאם ‫אם זה אפונסאפיביות, ‫או נכף שאנחנו צריכים אפונסאפיביות ‫ששכירים לעשות את האפליקציה, ‫ואז אנחנו היו שוב להיות איזה שאיך ‫האנחנו יכולים לסבל את כמה ‫הואו חלק את הפרקג boots ופרקג ביניהאים ‫בגללון קינב. ‫אז最後, אנחנו נראית ‫האירח של אשפי פרקג ביניהא ‫המסתלת, ואיך זה ימצאו ‫לשמיע קודבון רביליטי יותר טוב. ‫אז נעשה. ‫אנחנו נשאר על מה קודבון רביליטי ‫בסביר עשי שנה, ‫כי אם אנחנו נראה ‫השפעה שעצמם לנקודים. ‫15 שנה, כשמסתלת נקודים, ‫אנחנו נראה מונוליטית הארכיטקציה. ‫זה כל כך כך קודי, ‫שמסתלת על סבירה, אבל עכשיו, האחרDetail מלך שעדים מתחילים, הם מתחילים בארכיטקציה עלובית. Also, we went from local development, building monolithic applications into distributed applications. This change in architecture also affected the way vulnerabilities look like. Now, why is that? Because in the end of the day, if we try to understand what is a code vulnerability, it's a user input that comes from the user, it might be a query parameter in the HTTP request, וכן זה נועט מהבסוגה של מלאבי המרחש כדי להתקשפ oughtה המרחש ידעת את הלבי המרחש שמלאבי מרחש חשב ועדכו ידעת את הלבי המרחש עכשיו, בין התאירות של החלטים והמטח שלא יעבוד על הלבי המרחש, כמו השתנה או מנפולצי חלטים הלבי מרחש עכשיו, כשאנחנו עובדים למורמליטי המסגנת של 언ונדפסין והשעה היו בשביל לבדתי לכם את זה כול בן מסגנתי, עד הצהר של הכלרות שהיש נביא שמובן בית מונתinho, אבל עכשיו, כשמחוקים בית, המסגנת של האנגלית לא נופש שלנו. כי עכשיו, המסגנת של האנגלית יכולה להסתכל מובע תתתה, שזה היא שהייתה לתתתת את האינטרנט, על פאק. ואז המסגנת של הנulfה icd או רביטמ' קו או גם איזה זמן קרכת קבל מנסה גרפיסי, ה-CTP ואז אפילו פילסי על הפילסים. שזה עוד אחד של עריאלון, עריאלון מיקרוסביז, והגדלת הראשון נבשן ידי נשמח בלבד. אז בלבדים פציעים כבר נתקדם ככה קרובית הארכיטקציה. עכשיו, לא רק זאת, אם אנחנו תראו 15 שנדים פשוט, השמר הרעת שהפעילה To prioritize the vulnerabilities was the code itself because this is all we had. It was a vulnerability in the code. There wasn't any special infrastructure involved in it, so we had to analyze only the code. But today, applications are much more complicated. Applications also use infrastructure as code. They use containers, Kubernetes, Cloud provider, and these configurations have a huge effect on what a hacker is really able to do. The naive example is, let's assume that we found a vulnerability in one of the pods, one of the microservices. Does it expose to the Internet? Either any Kubernetes load balancer or Ingress that make this vulnerable component, vulnerable pod, being externally exposed. But the most sophisticated example is, how is the permissions configured for this pod? Maybe I found a log for shell vulnerability, one of the most popular vulnerabilities in the log for J package, but I found it on a container which is privileged or with a service account that is attached. So, in fact, a hacker is not able to take only this specific pod or specific container, the hacker will be able to take over the entire Kubernetes cluster. So we have to take into account also the way the infrastructure is configured to understand the real risk of code vulnerabilities. So we've talked about two main things that changed. The first thing that we need to understand the way the microservices communicate one with another in order to understand the real risk, and in addition to that, we also have to understand the way the infrastructure is configured. Now, as part of our work, we're always trying to find vulnerabilities in cloud-native-related projects, in CNCF-related projects, and I would like to show you one vulnerability that we found in one of the CNCF projects called Harbor. Harbor is a container registry. It was originally written by VMware. And one of the ways to deploy Harbor is in microservice architecture. And as you can see here in the architecture, we have two different services. The first one is user facing microservice, externally exposed, and the second one is a backend service, which is not exposed directly to the internet. Now, as you can see in the architecture, these two services are deployed using a different Golang version. Now, between these versions, the Golang team changed the way the Golang language pars query parameters in HTTP request, and we were able to use that in order to access internal images without any authentication data. So, I want just to show you a quick view of the CV that we found. So, as you can see here, we don't have any credentials to the Arbo instance. We are unauthorized, and here we are putting a semicolon in the request, and we are using that in order to get access to the internal images without any credentials. Now, let's give it a second. Great, so we can see here, this is the blob that contains the image layers. Now, we didn't have any credentials that allowed us to access it, and yet we were able to get this information. Now, I don't want to get too much into the data about these vulnerabilities. If some of you would like to read more, you can go to our blog post and read all of the details. We fixed this vulnerability together with VMware team, but I wanted to show it to express the potential risk in microservices architecture, and eventually getting access to internal image without any credential. This is something which is very, very dangerous, and the only way to understand this or to find this type of vulnerabilities is to understand service-to-service communication. Great, so now that we understand what is the potential risk in microservices-based vulnerabilities or in cloud-native application and the way vulnerabilities look like in them, we'd like to discuss more about how we decided to resolve this issue, how we decided to prioritize smarter this finding and actually find this cloud-native vulnerabilities. So, one of the ways we decided to tackle this issue is using OpenTelemetry. OpenTelemetry is one of the most popular projects in the CNCF. It's a very cool project. And eventually, what is OpenTelemetry? I will read it first, and then I will explain it. OpenTelemetry is a collection of tool, APIs and SDK, use it to instrument, generate, collect and export telemetry data. Now, when I'm saying telemetry data, we are referring to metrics, logs, and traces. Traces eventually are the most important things for us. It's the most important thing for us, because traces is the way the services communicate one with another. And OpenTelemetry does that to help you and us analyze your software's performance and behavior. Now, as you can see, security is not one of these things. We are talking about software's performance, software's behavior, and not vulnerability assessment. So, we couldn't use it out of the box, and we will touch shortly what we had to change in OpenTelemetry in order to use it for absic purposes. But eventually, I'll try to explain one level up what is OpenTelemetry. When we are talking about distributed systems, OpenTelemetry helps us to understand the service-to-service communication. It's an SDK that you can write in your own code base, and then it instruments the different functions that are responsible for the service-to-service communication, and then we get this data and we can see a visualization of the way the microservices communicated one with another. So, we decided to use that because if you are talking about distributed vulnerabilities and you have a very powerful tool maintained by the CNCF that does the exact same thing, maybe we can leverage that to understand vulnerable flows or vulnerabilities across multiple microservices. So, there are three main benefits that we got by using OpenTelemetry. The first one is vulnerable flow analysis. So, vulnerable flow, this is how we call vulnerabilities, the stretch on multiple services. So, if we have here, we can see in the diagram, we have a flow that starts on a Python API, and then the Python API communicates with RabbitMQ and we have here an internal Java service that is not externally exposed directly. We can use OpenTelemetry to understand that, and later on, we can use this information to know or to prioritize the vulnerabilities much better. So, this is the first and the main benefit we got from OpenTelemetry. The second thing, and it's also something which is very, very important, is vulnerability validation. One of the main pain points with vulnerability that we get a huge list that in reality, only 10% of them are really exploitable, because this line of code is never being executed, or this vulnerable package is never being loaded into the memory. So, because OpenTelemetry is a runtime solution, we can use that to provide much more accurate analysis. So, we can look which functions are really being executed in runtime and use that to prioritize the vulnerabilities and reduce the amount of false positive, because if I have a vulnerability that never being executed, I have to know about that, but it can be fixed later on in the process, because it doesn't create a real risk for my organization at the moment. So, this is the second benefit. The third benefit is the public exposure verification. If I have a vulnerability, and it can be a critical vulnerability, CVSS score 10 out of 10, but a hacker is not able to reach this vulnerability from the Internet, once again, I have to know about it, but it can be fixed later on in the process, because my organization is not in risk at the moment, because no hacker out there is able to exploit it, and I can fix it tomorrow or fix the Internet exploitable vulnerabilities before that. So, these are the main benefits we got from using OpenTelemetry in application security. But like everything in life, nothing comes easy, and we talked about it earlier, OpenTelemetry wasn't designed for security, wasn't designed for APSEC purposes, and we had to change something in it in order to work for APSEC solution. And there are two main things that we had to change. We had two main challenges by using OpenTelemetry out of the box for APSEC. The third thing, OpenTelemetry doesn't collect lines of code. If I have a vulnerability in a file called app.js, line 25, it can be a very critical vulnerability, a remote code execution. I need something that I can use to stitch it to the tracer that I collected using OpenTelemetry. And this something that I don't have currently is the line of code, because this is all the information I have. I know there is a remote code execution vulnerability, the file name is app.js, and the line number is 25. This is all. So, we had to add additional capability to OpenTelemetry, so every time a function is being executed, it also collects the line of code. It also collects the stack trace and the call stack. So, this is the third thing that we had to change in OpenTelemetry to work for APSEC. And the second thing, which is very, very important, eventually the original reason of OpenTelemetry was to understand the way the services communicate, one with another, to provide flow tracing. It wasn't built for security. And therefore, we had to add additional layer of instrumentation, which is more security-related. So, the trace that we will collect will start on the API service, will go through the message queue up to the internal service, but it won't stop there. It will go up to the dangerous function that can trigger two vulnerabilities, such as command execution or deserialization function. Now, by doing these two things, I have enough information that can be used to... that I can use this teach to my code vulnerabilities and understand the real risk of these vulnerabilities. So, eventually, how everything looked together. In this file, we can see that we start by finding the vulnerabilities using a static approach. So, same as we do today, we are running static analysis tools, we are running SCA tools, and then we know that we have a list of potential vulnerabilities. I know that I might have a vulnerability in the internal microservice. The file name is mainpy line25. But, by this point, I don't know yet if it's really exploitable from the Internet or not. I only know this is a potential vulnerability. Now, the second step is to provide flow tracing, to understand using open telemetry, the service-to-service communication with these two changes that we talked about in the previous slide, including the security instrumentations and the call stacks. Now, the final thing, we would like to look on the cloud infrastructure, on the Kubernetes configuration and the container definitions to provide even more accurate analysis, to understand which microservices in the chain or in the trace are Internet-accessible and what is the permissions of each one of them. And after we did all of that, we can recalculate the severity and now focus on the most critical vulnerabilities and not only on the potential vulnerabilities that even if they have CVSS score 10 out of 10, they can have less priority because they don't create any real risk on my organization at the moment. So this is an high-level overview of the security funnel that we've built. So now, after we understand what are the main challenges of finding vulnerabilities in cloud-native environments, in Kubernetes-based application, we also understood what is observability and open telemetry, and finally, we talked about how we can correlate between them together, between observability and appsec. Now we can see a live demo of a cloud-native environment with vulnerabilities and we can see how open telemetry helps us to understand the real risk of the vulnerabilities in it. So the lab looks like that. The lab runs on a Kubernetes cluster in AWS, on an EKS cluster. It contains two microservices, two pods. They are both written in Python. We can see the first one, which is externally explored through an Ingress, also on the Kubernetes cluster. The external API gets good input from the users, from the Internet, and then the external API sends the user input to the RabbitMQ, to the MessageQ, and the internal Python service gets the messages from the queue, and the message goes through this callback function. And if anyone can view, if you can see the vulnerabilities in the callback function, we can see here it gets the message in the body parameter in line 43, and then it's being connected, it's going to the send mail function directly without any escaping in between, without any sanitation. So if a hacker is able to control the body parameter, the hacker will be able to execute arbitrary code on this pod. But only by looking on this piece of code, on mainpy file 43 until 46, we don't know if a hacker is able to control it or not, because this is just an internal service that is not externally exposed. So now let's move to our Kubernetes cluster. So as we can see here, we have here multiple pods, we have Jager all-in-one, Jager is the back-end for open telemetry. We also have here two pods, the KubeCon external, the public Python API, and the KubeCon internal, the internal Python API, the internal Python service that listens for the messages in the message queue, and we have multiple instances of RabbitMQ that are also stored, also running on this Kubernetes cluster. So now let's connect to our Jager UI. So this is the Jager UI, let's look on the KubeCon external, and currently we don't have any traces, we don't have any information about it, so let's create traffic into it. So I will get the address of the external API, we can see it here, and using a Kale command, I will send an HTTP request. So I sent, I got back a lot from KubeCon and Cloud NativeCon Europe 2023, let's go back to Jager UI, I'll click on Find Traces, and we can see here one trace with one span that contains only one microservice, the KubeCon external. I can expand it, and I can see some more information regarding this span. But this one contains only one microservice. Now I will send another request to the API that will trigger cross-service communication. So this time I will send a command to slash internal with a query param called data, and I can send a load, and I got back 200 okay. Now this HTTP request triggered the external API to write a message to the message queue, and then the internal service got the message from the queue, and eventually it triggered the vulnerability we saw earlier. So let's go back to the Jager UI, let's refresh this page. And now we can see that we have an additional trace here. This trace contains four spans, but two services. We have the KubeCon external and the KubeCon internal. Let's open it, and we can see the four spans. The first one is the slash internal. This is the HTTP request I sent using the KERL command. After that we can see here the message queue sent and the message queue received, but the last one, and this is the most important one, we can see here the POSIX system. It means that the OS system function we saw earlier was executed. So I will expand it, and we can see here some of the information we collected. So we have here the code. We can see here the execution, so it was sent with a load as the parameter. The file name was mainpy, and the line number was 45. So it means that we have a flow that starts on the external API, goes through the message queue to the internal service and ends on this specific line of code. Now, in addition to that, if I will connect to one of the pods, for example, I will connect to the internal microservice, I will try to run a static analysis tool, one of the most popular Python static analyzers called Bandit, so I will install it. So now I will execute this Bandit, the static analyzer on the mainpy, and let's see if it finds the vulnerability. So we can see here that Bandit found a possible command injection vulnerability in a file called mainpy, line 48. So it means they are completely identical. The information we got from OpenTelemetry that we saw in Jager is the exact same information we got from the SAS tool, the static analysis tool, so I can connect between these two. I can stitch between them, and then I can use it to prioritize the vulnerabilities much better. Now, not only that, now I can know that this flow runs on two services, the KubeCon External and the KubeCon Internal. So it means that I can go to the Kubernetes configuration or the AWS configuration and look on each one of these services, whether they are externally exposed or not. And as we saw earlier in the kubectl getServices command, the external API has an external IP. So it means a hacker is able to send an HTTP request to this address, to the external address of the external API with a payload. The payload will go through the entire way from the external API, the RabbitMQ, to the internal service, and it will trigger the vulnerability we found using the static tool. So the risk is very, very high, and this organization needs to fix it quickly as possible. So this was an overview of a real-life example of the way we used OpenTelemetry for APSEC. And I thought, Thomas, what are the main key points that I would like to give from this presentation? So I thought about these four things. First of all, I believe that modern problems require modern solution. And application security testing tools or tools that are trying to find vulnerabilities have been around for quite some times, I would say more than 20 years. Having said that, the way vulnerabilities look like changed. And these tools need to change as well. So if we're trying to find vulnerabilities in modern applications, we need to use modern tools. So you're verifying Outlook. And the second thing I would like to add to the summary is that eventually, when we are talking about the vulnerabilities in cloud-native applications, we must take into account all of the cloud-native information we can collect, such as traces and infrastructure configuration. The third thing I would like to highlight is OpenTelemetry. OpenTelemetry wasn't built for security. It's an observability tool. Yet, if you can use it for application security purposes and it will help us to make our organization more secure, we should do that. And generally speaking, if there are any open-source tools that might be CNCF project tools or not CNCF, but I can use them and adjust them into my needs, this is something we should do. And the fourth and final thing is observability. Observability is crucial for us to understand what is the real risk for microservices-based applications, for Kubernetes-based applications. We can't analyze each microservice separately without any knowledge about what's going on on top of it or under it, what's going on on one microservice before it, on one infrastructure layer under it. We must know all of that when we are trying to understand or to make our applications more secured. So, thank you very much for your time. I'm very happy that you joined my talk. I will be in KubeCon for the upcoming days, so if anyone of you would like to grab a chat and talk about AppSec or observability or just CNCF Security in general, I'd be happy to do that. I'm also available in the CNCF Slack. And I would add to that that in our daily job, we are always looking for vulnerabilities in CNCF projects. So, if anyone of you is interested in security research around CNCF projects, feel free to browse to our blog post. We've recently released a vulnerability that we found in HashiCorp Vault. We also released a vulnerability we found in another CNCF project called Backstage. We got a CVSS score of 9.8 and some additional vulnerabilities in Harbor. So, feel free to browse and thank you very much.