 Thank you to all the Cloud Village team for accepting our CFP and allowing us to speak today. So let us introduce ourselves. I'm Alex Sierra. I'm co-founder and CTO of Tenchi Security. I'm also accompanied here by Leo, Leonardo Viveros. I have presented before, as Jeff said, in a few conferences, including last year on Cloud Village on the topic, on a presentation called SideSpocalypse, called, talking about supply chain attacks in the Cloud. Not that supply chains have become a hot thing in InfoSec, you know, this year, but we haven't seen enough on the Cloud yet. Leo, do you want to say hi for a little bit? Hey, hello everyone. This is my very first time at DevCon and this is actually six months ago. I didn't knew about this. I joined Alexandre companies. He showed me about the amazing world of cyber security and here we are presenting. How about that? This time last year, he didn't even know what DevCon was. Now he's presenting at Cloud Village. But let's talk about the subject at hand. Why should you care? This talk will cover a little bit a specific issue around doing serverless application security in AWS. Serverless is a nice thing. Serverless is all the rage right now. You see getting a lot of good press, which mean widely adopted as far as we can see. It's good. I believe it's another step in the evolution of how we build applications that we're always looking for the next level of abstraction that already bundles many of the concerns and abstracts away some of the complexity of what's underneath. If we didn't progress in that sense toward high level abstractions, we would all be writing distributed systems in assembly, which I don't think would be fun. The end goal here is to always have the developers work as much of their time as possible writing business logic and delivering value to their business, whatever that business is. And as little time as possible focused on the nitty-gritty on the operating systems, the applications, the components, and networks where their applications are running, because the business part is what's actually paying for everyone's salary. But the thing is in serverless as well as on any of the previous layers of abstraction we used to do on development, you always need to look at the foundations of the infrastructure applications running on. And if before serverless we were looking at servers and how they ran, how they configured, how they scaled, how their ability looked like, now you have to be concerned about the serverless components and frameworks that your application has been built on top of, and again worry about reliability, scalability, performance, and security. So if you're doing serverless in AWS, you will be hard pressed to avoid using Amazon API gateway, right? Or as AWS employees would say AP gateway, I think, right? Because they say AMI anyway. What API gateway does, and you're going to see it being used by pretty much any serverless framework that works on AWS like the serverless framework, SIM, Chalice, you name it. It's a serverless scalable reverse proxy service. It's going to receive HTTP, HTTPS requests, and it's going to route them to a specified backend. Most typically when we're talking about serverless applications, those backends will be Lambda functions, right? You're going to have a REST API, say, and you're going to map each API endpoint to one or more, or actually you can map many endpoints to one Lambda, but you're mapping endpoints to Lambdas that implement the logic behind that endpoint. A client doesn't have to know what your backend is, and if you're using Lambda, again, you're serverless all the way through. That's what we see being used the most often. And since you have this one point for all this routing, besides the actual routing, API gateway is a nice integration point for a lot of additional features like scaling and caching. The service itself, API gateway, as well as Lambda scales on its own based on demand, but you can also do caching. Both API gateway does its own caching, but you can also integrate it with CloudFront to cache static content. You can do versioning of your APIs really easily. You can do monitoring and observability. So API gateway generates their own logs, but you can also integrate that with services like Xray or CloudWatch, things like that. And as far as security goes, API gateway can implement input validation. If you specify, you know, the GAT parameters, the PATH parameters, you can specify some basic input validation that's going to be performed. Even before your Lambdas are called, you can also use API gateway as an integration point for AWS web application firewall or WAF. And API gateway, most importantly for our research, also allows you to do authentication and authorization and abstract that away from your code. A few key concepts around how API gateway works. Each API is an object that's identified by my ID that's generated automatically, a bunch of like an account ID, but it has letters as well, just not just numbers. You can define as an API gateway user stages for your API. The way this is typically used is if you're going to have versioning, it's going to be a stage called V1 and then a stage called V2 and V3, or you can have stages like development or production or things like that. And so as far as authentication goes, there are a few ways to implement it. You have a CIM solution native to AWS called Amazon Cognito. It's a customer identity and access measurement solution. We're going to talk about it on the next slide. You have native capabilities on API gateway to do OpenID Connect or OAuth 2.0. You can also, God forbid, use IAM policies and privileges to allow entities or principles in AWS accounts to call API endpoints on an API gateway API. And when you want to do something else, when you want to do something bespoke or custom, you go to something that is called Lambda Authorizers. This is what you would use when you are integrating, let's say, with a third-party tool like Off-Zero, JumpCloud, or things like that. So this is pretty frequent. Before we go into Lambda Authorizers, which is the star of the research here, let's briefly credit my good friend Andres Riancho from Argentina, now working as a CISO at Wildlife Studios. He did a research in 2019, specifically on Cognito security. We won't delve deeply into that here, but just to give you an idea. So it's a CIM solution from AWS. But the way that a lot of the AWS documentation describes developers should use it was not a really good architecture. Basically, the idea was you create users, you assign them to identity pools. They can authenticate using either login password or social logins. And once they're validated against an identity pools, they can get a set of temporary AWS credentials that hopefully have been scoped and have this privilege to just access the things on AWS that need to access, like calling lambdas or accessing files on S3 buckets or things like that. The problem is that the premise on the AWS documentation was that client side components of your application, so your mobile app, your single page application, would do that. The credentials would go back from your API into the client, and the client would then call AWS APIs directly, which is a horrible idea. It's like exposing your database to the internet. And every time someone logs into your web application, you provide, I don't know, the single page application of database credentials, but you just give them access to the right tables and columns. It's just not the right level of, it doesn't impotence mismatch right there. You want to abstract that away through your API and have your API on API gateway have the necessary privileges to access any other AWS resources. And so what Andres did when he found out that was what people were doing was he did a major search on the internet, including Common Crawl and other large datasets. And he found a lot of web applications and then some mobile applications that had identity pools, IDs hard coded into the apps to do precisely that. Moreover, he found out that you don't even need to be authenticated to get some AWS credentials. When you create, when you're using Cognito, you can set a level of access even for anonymous users for an unauthenticated user. So just using those, he was able to gain access to over 13,000 S3 bucks that were not public, 1200 DynamoDB tables and 1500 Lambda functions. Pretty amazing research. Check out this link. Check out his paper. It's really scary. Now back to API gateway. Let's talk a little bit about how Lambda authorizers work. So as we were describing before, if you have an API and API gateway, you have a client that's going to do a request, and then API gateway has to make a decision on whether this request is authorized or not to route that to, let's say, a Lambda implementation, Lambda that implements an API endpoint. When you use Lambda authorizers, the way this happens is before they call the Lambda that actually implements the API operation, they call a special Lambda called the Lambda authorizer. That's also part of your application. It's a Lambda function that will get as a parameter the authentication tokens that were presented on that HTTP request, typically the authorization HTTP header. You can configure it to be anything else you want, like a GET parameter or another header, but typically that's what's used by default, and that's what makes the most sense for most people. And so what that Lambda does is it gets that information besides a lot of other information about other things on that HTTP request. It evaluates, you know, let's say that authorization handles has a JSON web token or has some other form, an API key. You're going to evaluate that using whatever logic and code you want to write, and you're going to return information to API gateway that tells it who this user is, any metadata that you want to have passed to that backend of the API, let's say that the Lambda that implements the API operations, and you're going to give that API gateway what's called a policy document. Let's look, that tells API gateway which operations that person can and cannot invoke. So let's look at what that looks like. This is straight from the AWS documentation. Basically, the Lambda authorizer typically will return a JSON object that will identify the user, right, which principle it is. It's going to give a context with metadata, let's say which groups they belong to, which privileges they have in their application or things like that. Those things will be passed along as additional parameters to the lambdas that implement API operations on your API. But also importantly, there's this one field called policy document that this is the part that API gateway itself will process, will use. That's what it will use to decide which API endpoints this user can call. And shockingly, this is not in the format that anyone that has ever used a reverse proxy web application firewall would expect. What you would expect would be something like a list of reg access of URLs that can access or things like that. Instead, what the API gateway team decided to do was to implement this in an IAM, AWS IAM policy format. So basically, they created an action called execute API colon invoke, right? And then you have the traditional format of allow deny and you have to specify a resource. And if you look at how they encode that resource here, right, it's an RAN, an Amazon resource notation as usual, with an execute API service, right? And then you have the region as usual, because API gateway is a regional service, you have the account ID. And then on the last on segment where the resource would come in, right? They have encoded, separated by slashes, the API ID, the stage name that we talked about before, like V1, V2, or broad dev, the HTTP verbs of get, put, post, delete, et cetera. And the path, the resource, right? So the actual URL path of the API endpoint that was called. So basically what API gateway will do is they will store that policy and every time they're going to cache it, that's the way again, this is most often used, it's optional, but it wouldn't make sense to not use it. API gateway will cache that policy for amount of time. And every time someone comes in with that same authorization header, it's going to evaluate, is the API, not going to call the Lambda authorizer again, just going to use the cached response to say, is the endpoint they're calling match an allow statement on this policy document, they're going to build an arm based on the request that's being made, they're going to compare that with the policy that was returned by the Lambda authorizer, right? So this is a controversial decision. I can't understand a little bit why they did this, right? If you are developing a new service at AWS and you want to achieve MVP status as much as possible, the IAM policy engine, it's just sitting there, right? It's tried, it's trusted, it's scalable, it's free for users, right? It's not charged for users that probably also has implications on the price they had to charge for API gateway, right? But here's the thing. First of all, this is a format that's not at all familiar and intuitive to developers. It's much more so to ops and to security folk than to developers, right? That people that deal with IAM on a daily basis, and also it's not nearly granular enough for the things you need to represent when you're trying to control access to API endpoints, right? So it's case sensitive, that that's not a problem, but it doesn't have regular expression matching, right? So think about it. If you have an API path parameter that's part of your path, part of the URL, you're encoding IDs that are essentially external parameters that instead of being provided to the API in the form of a GET parameter, like after the question mark, it's being passed as part of the path as usual, you know, with REST APIs. There's no good way in AWS IAM policy to specify this URL and to be clear that you're just allowing this type of value, a specific type of, say a UUID or a number on that middle of the URL, because when you look at IAM policy syntax, it's very literal and all you have are two wild cards. You have a question mark to signify just one character, like the dot on regular expressions, and you have the star, which greedily expands to anything on that one ARN segment, right? If we recall how ARNs work, they have several segments separated by colons, right? And so if you have an asterisk anywhere inside this entire last part with API IDs, stage, HTTP, verb, resource, et cetera, you can expand to occupy the entire thing. It will not stop at slashes. It would not stop at anything except the end of the string and this first colon here, right? So this is a huge problem because it makes it so that it's really hard to avoid making mistakes and to apply least privilege here. I'm going to let Leo tell you a little bit about a few examples that he built that showcases this. Leo? Yeah, so suppose those are the end points for regular users of our application, not admins or stat users, just regular users that can create an account in your website. As we can see, we have tools, get, put, and delete methods, and all of the routes has a prefix of dashboard on it. So we do make total sense for a developer that wants to be economical on the way they write API gateway allow policies to write something like we have there. So our API ID slash prod for our stage name of production, for example, slash star to match all of the methods put and delete slash dashboard for everything related to dashboard is what these underprivileged users can do slash stars again to match, for example, user with username, user login and stuff like that. So it's something for a developer to write a policy like that. It's very simple and security and simplicity goes together, right? Wrong. Actually, when you write something like that, please next slide. Would you allow other routes as well? So suppose we have these other two routes that are reserved for admin users with elevated privileges. The first one, for example, admin store order credit card dashboard is going to charge users in their credit card. And the next one, admin dashboard create admin users that's going to let you create another admin users. The first star will expense not just in this post method, but also in admin store order and credit card. So the user regular user that has this policy will also be able to call this method. And also please next. And also the last star will also expense that you after post method admin slash dashboard and create admin user. So as stars can be anything, there can also be nothing. So independent, if you have dashboard in your route in the beginning, the middle of the or the end of your endpoint definition, a user with this kind of policy will be able to call this endpoints. Next, please. Now we're going to take a look in a video of a deployed stack with this exactly same methods. And we have some user methods and some admin methods. We're going to see if it really works or if it's historical. Please, can you play the video? First thing we're going to show you the endpoints, but it's pretty much the same thing you just saw two admin routes and six other user routes. And then we're going to take a look at our authorization function. It's pretty standard authorization function. Every server's developer should be very familiar with it. First things first, we verify to receive the authorization token. If we did so, we use a DWT token to verify that. We can use a third party application provider, but for the sake of simplicity, we just do a verify. And after we found that the user is has the role of user, we generate this policy with star slash dashboard slash star, just like we mentioned before. And this first star is where we have our fault. Now we're going to start fetching our private requests against these routes. So first thing, we're going to get a dashboard user with the username. That's if that's a seed. Yeah, it's a seed. 200K as expected. Then the other without the username in the end slash dashboard slash user, of course, it's the seeds. And then it puts methods after that. And it's the seeds as well. We're not going to test every one of them because we have a very little time, but you've got the idea. All the user methods are going to succeed. Just as expected. But what about admin routes, the credit card and create admin stuff? We've made a post on admin store order credit card slash dashboard. We will also succeed with the same token with the same rule of the regular user, not admin. And the same thing about admin dashboard, create admin user proving that independent if it's in the beginning, the video or the end, if you have star slash dashboard slash star, it will match. Thank you. Thank you, Leo. And so it's important to note that if you had a path parameter, and you were able to choose as an attacker to have the word dashboard there, let's say there were entities you could create that would be later represented in some other URL, the simple fact that it had dashboard there, that one ARN would allow you to call that endpoint. So you can see how it's pretty easy to mess it up using stars, right? And more importantly, it seems like even the team working at API Gateway at AWS, or at least the team documenting and doing other tasks for the API Gateway seemed to have an incorrect understanding of how the stars expanded on IAM policies, right? It seemed like they themselves were not fully sure how this worked, which is a shame because developers that were diligent that read the documentation for API Gateway, they would have been misled by this documentation. So this is one example from the old version of documentation has since been fixed. We're going to talk about this in a little bit, but it says, what about you want to create a policy that allows you to access any method, any HCCP method, any resource on any API, but on the test stage, right now with the way that the granularity that the policy document has, it's impossible to create a policy that does that. But the API Gateway team said, you can just say star slash test slash star on that last ARN segment, and you're going to achieve that. That's incorrect. You are allowing anything on the test stage, but you're also allowing much more. If you look at those four numbered examples here on the bottom right. So what you intended was number one, but you are also allowing numbers two, three and four, right? So it seems, and you're going to be seeing that more and more on the official AWS content that the team writing this seem to think that stars stopped expanding at slashes, which is not, this is not how they work, right? You can see this is explicitly said on that's later on that same documentation page. The API ID can be replaced by an asterisk for all APIs. That's not what it does at all. It's going to be able to be replaced by everything else and also include the stage name, HCCP verbs, et cetera. So you can see they do that with every single field, whereas as if you could use asterisk to replace just that one field, right? Again, as we just saw that that's not how this works, right? Also, AWS provided what they call Lambda Authorizer Blueprints. They have, which is pretty a pretty awesome idea, by the way. Kudos to AWS for doing this. They have on the AWS Labs GitHub account, they wrote like a skeleton, a blueprint of how a Lambda authorizer should work on a variety of languages. So JavaScript, Node, Python, Go, Rust, et cetera. And then developers can take a shortcut. Instead of copying and pasting from Stack Overflow, they can copy and paste from AWS's own GitHub repo. The problem is the repo had that same incorrect assumption. You can see they had comments saying you could replace any one of those things by an asterisk and that wouldn't affect anything else. You have default values for API IDs or stages or methods being asterisk on some of the language versions, right? And again, that's leading your developers towards making a mistake. And even the AWS Console reinforces that incorrect assumption. For example, this is the Console for API Gateway. So if you look at a particular method or method request, you can see that it's creating that same ARN format. It's giving you the API ID and then a asterisk for the stage and then get and then a path, right? It's using replacing just the stage by an asterisk when it's showing you what this ARN for this particular operation would look like. And the same in the Lambda Console. If you go to a Lambda, that's being used as the implementation of an API endpoint, it's going to tell you there's a trigger from API Gateway. And it's going to give you again something that looks like that invoke API ARN format. And it's going to be API ID and then the method and the stage are going to be asterisks. So all around AWS, the API Gateway team, the AWS Console team, the team that wrote the blueprints seem to be working under this incorrect assumption that stars expanded differently than what they actually do. And that has to mislead developers, right? So let's look at a little at actual examples of places where this came to pass. Leo? Yeah, not looking at real-world scenarios, but it's not so easy to find too many serverless applications that are open source, but we find one that is pretty interesting in this case. It is an application that makes difference between guests and admin users. And because there are no such ways to represent using regular expression, for example, the parameter paths of the API, these developers just wrote something like guests slash problems slash stars less submissions. And for example, posts slash problems slash stars slash submit. Thinking that it will allow, for example, just using the IDs of these problems that are getting the submissions, posting and getting the submissions. But in reality, he is allowing much more routes. For example, if he has an admin route that has something related with submit in the end or in any part of it, like we have this in this example, get problems, full bar submissions or post problems, full bar submit. The guest user reference policy will also be able to call this. So this is one of the examples we found. There are not so many open source serverless projects, but this is one very interesting one. And also as a developer, we, everyone in their life said, well, it works on my machine. And this is exactly the case with when we are developing serverless APIs using serverless framework. As NPM says, this serverless offline plugin has around 130, 134,000 downloads per week. So it's not something not used. It's very used. And it's what I use it all my life to developing serverless APIs. And it has a difference between the way it comports between your machine and when it's deployed to AWS. In this example, we are fetching a post. We are doing a post on a route that is dashboard, our stage depth and slash dashboard. And that's it. In the serverless offline plugin that we're going to test in your machine, it says it's forbidden, 403 forbidden. You can't do this. But when we deploy the same stack, the same code to AWS, it allows us to call this method. And it's very bad because probably this serverless offline plugin is being used in CI-CD validations and meta validations. And they are leading to developers if they are safe in reality when they are not safe at all. This is a consequence of IAM policies being very complex and much more familiar for security and ops guys than developers. And as developers write tools for developers, we end up in a situation like that. We can't reproduce the very same situation we have when deployed in our local machine. Steve, please. Yeah. So it's sad. Even developers that try to do the right thing and test their application 30 to make sure things that should be blocked are being blocked if they're using really well-known and good and awesome reliable tools like serverless offline, they're being hurt by this. We submitted this as an issue, as you can see. We're not sure if they have fixed the problem yet. But we hope they do soon. And if we have the time, we'll even try to write a pull request to assist in this. So once we found this, this was on April of this year. We started, so what we did was responsible disclosure. We reached out to AWS. We reported the issues we found in the documentation on the console, on the AWS console. We had a variety of conversations with them going back and forth. They were very responsive. They were really professional. They were really concerned on doing the right thing. I have nothing but good words to say to them. And especially, I'd like to thank Mark Ryland for the role he played in intermediating, helping mediate this conversation. The only thing I would say was a little bit less than stellar is the fact that they made a decision not to update the AWS console. They continued to use stars as a placeholder for any API stage or any API ID on the Lambda console and the API gateway console, which I wish they hadn't. I wish they fixed it. But they didn't. In any case, the documentation for API gateway for IAM and the Lambda authorizer Blueprints was fixed. And DO even helped them on the Lambda Blueprints stuff by submitting a pull request. And so this was very collaborative and we thank them for being so nice about us. And the other thing we tried to do was thinking, look, we have something here that is getting external input. And based on this input, it's generating a policy that's going to be executed by API gateway. So if we replace those words with a SQL command that's going to be run by a database, we started thinking, hmm, maybe we can do an IAM policy injection attack. Can I send some form when there's a Lambda authorizer that uses part of the request, even if it's just the authorization header or maybe a field inside a JSON web token to build the policy document, can we escape the quote and insert an entire new statement on that policy that allows for asterisk, that allows for everything? And it turns out we can't. We cannot. AWS made a choice that Lambda authorizers cannot return strings. It fails if you try to return a string with that response format. It has to be a native language object, like an object in JavaScript or a dict in Python. And so there's very little opportunity to escape the string and try to change the structure of the document. So that kind of injection attack was pretty thoroughly foiled. We don't know if it was just a happy coincidence on how Lambda's work and their wrappers do the hard work of converting things to JSON. In any case, it did the trick. But what about inside that resource string, not escaping it? What about we have a resource that's in a part of the policy that's going to use external data? Let's say we get a user ID inside a JSON web token, and we are able to change it to a master's. Okay, that might work. That might actually work. But no one's going to do that, right? Oops. It turns out the very AWS blog entry where they announced the introduction of Lambda authorizers had exactly that problem. They were deciding how to build that ARN by adding slash users slash the content of a field on a JWT on a JSON web token. And of course, JSON web tokens are signed. If you can tamper with JSON web tokens, you're already on the second or third stage of an attack chain. But still, saying algorithm equals none on JWTs is not exactly new. There might be other ways in which you can subvert or create JWTs under your control. If you can, or if you can find other ways to inject data that you think are going to end up into a policy document, try changing them to an asterisk or appending an asterisk after them and see how that changes the authorization decisions of their API. This might actually work, right? So we're going to close up with a few recommendations, right, on things that AWS can do and that everyone else can do. Starting with AWS, the first thing they could do that's, I mean, they're pretty tied down with their decision to use the IAM policy format, right? It would be a big change for them to add an additional format based on rig access, or even harder, I think, to add rig access to IAM, right? But one quick thing they could do is add conditions. Currently, that execute API invoke action does not allow for any conditions. It's just a standard part of how IAM policies are evaluated. If you had conditions for the API ID, the stage, the method, and the resource separately, you could do that thing where I want to allow just the test stage for any API on any method or any resource. It would look like the right side of the slide. It is currently impossible to do this or anything without conditions that replicates exactly that, right? But if they added conditions, this would already make it so much more granular, like you wouldn't believe, right? Please, please reconsider the decision not to update the AWS console, change those space holders, you know, Kodi brackets, API ID, Kodi brackets, or anything other than asterisks. You do not want to lead developers into using asterisks when they do write the ARNs themselves, so don't do that. And just the conditions and just exchange here still doesn't fix the problem of how to deal with path parameters. So that can only work if they extend and add more functionality to IAM as it exists today, and my understanding, again, either more functions that match regular expressions on conditions, or if they add a new wildcard that doesn't expand beyond slashes, right? That's not the question mark and the star, right? One of those two things has to happen, right? And finally, it seems to me, I know nothing about how AWS works internally, but it seems to me like they need to have more senior security people on those famous two pizza teams when they're designing how products work, because this is the sort of thing that a security person probably would have got early on when they were designing the service, right? For AWS customers, if you're using stars on those invoke API ARNs, only use it at the very end as a slash star at the very end to allow a prefix for your path. Anything other than this, and you're treading on dangerous ground, right? You're in quicksand. We got one minute. Okay. You might be able to use the nice statements to limit the impact of some of those startups' expansions to look into that. Do not use Lambda authorizers as the only way to decide authorization. Since specifically you cannot handle path parameter as well, you might need to recheck at the lambdas that actually implement the API endpoints if the user is authorized or not, right? If you used any Lambda authorizer blueprint code, update it based on the new version, and that's it. Those are the things we're recommending. Let's see if you guys have any questions that we might be able to answer. Hope you enjoyed it.