 Welcome to Authorization in Software, the podcast that explores everything you need to know about authorization. I'm your host, Damian Schenkelman, and in each episode we dive deep into authorization with industry experts, as I share their experiences and insights with you. If you're a software developer or just someone that's interested in the world of authorization in software in general, you are in the right place. Let's get started. My name is Damian Schenkelman, and today I'm chatting about how BOX approaches authorization with John Halfaker, distinguished engineer at BOX. Hey John, it's great to have you here. Thanks for having me, Damian. It's my pleasure. Could you maybe give your listeners a brief overview of your background and what you're doing at BOX nowadays? Yeah, for sure. My last job was at a faceted search company called Indeka in Boston. We powered sites like New Egg and things like that. Whenever you search for a part or a video card and then there's the price range and things down the side, we enabled a lot of that. We did a lot of analytics apps and eventually were acquired by Oracle. I then moved over to BOX a decade ago. The first team I joined there was the team that was launching our public APIs for consumption by platform partners and third-party applications. One of my first big efforts there was on our metadata platform. I launched our metadata facility which lets customers define templates, whatever you want to call them. Say they want to store a contract with us. They could say the amount, the counterparty name, the expiration date, things like that. We use that as they fill out values. You can use that for faceted search, but also workflow. It kind of goes in a lot of different places. We do this a little bit right now, but hopefully over time it also helps drive authorization decisions. During building, BOX is a pre-AWS company. We had our own data centers and I saw how challenging it was to get. We were evolving from a monolithic architecture towards a more service-based one. I saw how hard it was to get a new service launch at BOX. I moved over to work on one of our internal platforms built on Kubernetes. We're super early adopters of Kubernetes. Interesting story, BOX built the initial version of KubeCuttle, which people use a lot today. I didn't know that. That's an interesting fact. One of our co-founders actually ended up building that. We were just super bleeding edge. Deployments didn't exist. We were manually managing replica sets super early on the Kube bandwagon and went through all the different growing pains. It was a great opportunity to interact with Joe Beta, Craig Mclucky, Brian Grant, Tim Hawken, all those people that started that project and worked with them on how to use it and round out the ecosystem. We built a lot of early components there. It was definitely challenging. From a business standpoint, probably hopping on the bandwagon a little bit later, might have been a better move, but super fun as an engineer to work on a project that early in its lifespan. Then after that got stable, fairly well adopted internally, I moved back to working with our core team, which is if you think about the backend platform for everything you think about when you think about BOX, like files, folders, users groups, enterprises, all the just sort of core, high-scaled objects that you think about. That's where I'm working now just across a very, very, very wide variety of efforts there, including still interfacing with our internal platform teams as we do big migrations, as representation from the core team. I think that covers all of it. Hopefully your listeners aren't going to sleep yet, but... It's interesting because you've been through a few different areas of a very large product and platform, right? In my case, I've done some similar things here at OZero and AOTA, and that gives a unique perspective whenever you're building authorization capabilities or any kind of platform capability in general. I think that perspective will be very interesting for anyone listening to the episode because we'll be able to mix general product and technology perspective with a more security and authorization perspective. Totally. Yeah, it's BOX. I like to say about BOX. I've been here a decade, but I've had three very different jobs over that time period. And BOX has big company problems, but sort of small company infrastructure. And so anytime we go to build something new or develop something new, we're always looking at it from a very basic level all the way down to the platform level to think about how we can best serve both our internal and external customers. So yeah, it's a fun place to work. I'm sure you probably have a lot of similar experiences at OCTA, just working on very high-impact, basic stuff all the time. Yeah, one of the great things I like to tell folks on my show is that we get to experiment and figure out, in our case, what the identity industry is going to look like maybe in the next five years, but we are the ones shaping it considering the impact that we can have. Yeah, it's very fun as an engineer. Yeah, definitely. So as always, I like to start with the why, right? And this is why I was saying that I think your context is not just working on authorization, but in general, across the BOX program, the work platform will be useful. Why is authorization important at BOX and what happens if authorization is not working properly? Yeah, I mean, there's lots of layers to this answer, and we can talk through both. But the high level is BOX, you have to think about it for a minute, but BOX is secretly a security company. So, you know, soaring files, managing the metadata, managing all that, like all the facilities to extract value out of files certainly isn't easy, but when customers come to us and they buy us, it's, you know, the big blocker for most people moving from like their on premise. But share sands or shared drives is the security risk, right? Like you're putting your files on the hand. You don't want it to get out there. You don't want the various governments or state actors to get access to them. So, you know, the kind of key facility that we sell to our customers and probably the biggest risk to our business is having like a big security incident in terms of losing customer trust. So we spend an enormous amount of time as a company thinking through all the different security aspects. So our security organization tends to break down on two lines, like there's the application security and then there's like the infrastructure security and our application security team is just very focused on like, how do we ensure that the application that we have is not, you know, like somebody doesn't add a new line of code and make files accessible to anyone that has access to the system or, you know, various application concerns. How do we build the product securely? Like running PEN test. We work with Hacker One a lot, which has been a really fruitful relationship in terms of getting early PEN test results. But also there's like the whole other side to that, which is we're basically a file storage, the file productivity platform. How do we give tools to enterprises to help their users not expose things, right? Like there's always some level of sharing that you want to do with a document either internally or externally and allowing those your internal users to do that safely and securely is a big deal. So we all, you know, all the way down from like shared individual shared link exploration and things like that and passwords all the way up to more advanced stuff like classification policies. We just launched information barriers functionality that helps keep different parts of the enterprise from seeing each other's content. Email uploads, secure file request features, things like that. So yeah, it's securities a very big deal to us and our customers. And all of those are different types of authorization decisions. And then like the more low level infrastructure oriented authorization decisions are, I'm sure your customers are fairly aware of all these issues of making sure that, you know, your IAMs roles are locked down, don't have have least privileged access to various resources, ensuring that an attacker that somehow manages to get access to one node can't pivot and start moving laterally to the infrastructure is another place where we spend a lot of time and we use things like micro segmentation using Calico. And now we've got some local policies down at the infrared layer validating that changes that the person making a change should be able to make that change. Yeah, so just layers and layers of authorization decisions floating around here. Yeah, I was I was going to say right like you folks store a bunch of sensitive data on behalf of customers and like everything else like security is all layers and layers and different kind of like ways of protecting things. One thing that you mentioned was that again, there are a lot of things that you can do right there. You need to authorize users, you need authorization at the different level. There are multiple ways in which you can file share, you can file share to a user, you can file share with like temporary time limits, you can upload like in an email, things like that. And then there's also kind of like you mentioned the notion of at the beginning of having an API and working with partners, which means there's also kind of like up permissions. So yeah, a lot of things like at a high level and we dig deeper across kind of like the show, but like how does box handle authorization? Can you maybe give us an overview of that? Yeah, so we have lots and this is something that's a little bit in flux, but we like obviously depending on the thing that you're asking about there's lots of opportunities for authorization checks. But the bulk I'm going to talk specifically here about the applications authorization decision. So like if I fire up my web browser and I browse to a folder and it shows all the items there and then I upload a file, those are all, there's all individual authorization checks being done in each step in that UI. Can I view the folder? Can I view the folder's children? Can I view that file? And can I upload a file into that folder or all different authorization checks that we need to do? And given the nature, given our nature is kind of a web application, we tend to, and given the complexity of the object model and authorization scheme, we repeat the same checks over and over again on each request. Yeah, so and even if I'm like viewing seeing if I can see a user, we have to see can you see that user through your collaboration graph? Are you in the same enterprise? Can I see a comment, et cetera. And so the way this is orchestrated, and so this is something that's changing as we migrate as we break down our monolith. But I'm just going to talk specifically about our monolith for a moment. So we have effectively the old boxer as a classic lamp stack way back in the day. And within there, you know, we have like a kind of active record model that was home built. And on each of those at any kind of point in the code, you can say, does this user, does this context have the ability to see, does it have this permission, whatever the verb is on this object? And there's some like some that are built deep down into the data layer. So like any time you try to fetch the files metadata, it's just a metadata like name or the size that kind of like core object information. We check if you have view access. And there's a few other places like that. And within that, that kicks off that like do I view access? It's off a call to our permissions policies, which are just more PHP code that basically break down into two flows, like a grant phase and a reject revoke phase. And at the end, we check, hey, did you did you get granted view? And is it still there? And if yes, so we, you know, return true and we return the object. And those policies are just like, and nor it's probably the single most complex thing within the box system. It acts in access to the number of tables, very performance sensitive. And it checks, it checks, like, oh God. No, I was going to ask a bit about that, how kind of like these two ways of evaluating access for the positive and the negative, but since you can do it from anywhere on the code, and all of these things require state from both code review perspective, performance perspective, security perspective, this might be a bit tough to do, right? Like knowing when you should call it and having to optimize things so that like you minimize calls and so on. Yeah, and this is one of those things that's specifically interesting as we move from the monolith to the micro services universe. Honestly, a little bit, it's a tiny bit easier. There's different set of challenges in the monolith universe. You know, we, the request is largely happening within the monolith. And so multiple, we, we cash multiple calls or memoize multiple calls to the function. And so adding superfluous checks to the same object isn't too much of a, isn't at all a perfect, it basically short circuits out. In the services version of the world, but also on the monolith side, it can be somewhat, yeah, you, you want to do a balancing act between calling it too much, that also like the worst thing that can happen in box at some level is that you don't call it, right? From going back to the previous thing there. So that is one of the challenges with our monolith is we, you've got a bunch of library, it's a normal mod, you got a bunch of library functions, some of which may call the permission check, some of which may not, and you need to ensure that the permission check is happening. But you have some of those library functions are used in permissions calculations themselves and so you don't want to use, and some of them you don't want to be calling using permission. So it gets to be a real struggle reasoning about where the permission checks are done, where it isn't. On the services side of the world, we've landed on like a really simple model. We have domain services, they have endpoints and effectively every one of those endpoints has a permissions policy associated with it. So there's no question there about like where it's applied, when it's applied. It's applied very rigorously versus with the monolith. There's a lot of weight put on the testing to ensure that we're doing, I mean obviously we need to do testing in both cases, but it's more error prone and so you have to like really back yourself up with negative tests to ensure that all the permissions policies you expect are being enforced on an endpoint. It's very interesting that I'm checking things now, like 18 years old, starting in 2005 and only now you're starting to kind of like go on the one hand towards microservices and there's a whole industry debate always going on about like what's the right time, it's not the right time, but also starting to extract policies away from code and into kind of like more policy logic. Yeah, I'll dive back into what we kind of do on some of the policies in a second, but just to answer that. We've been pulling services out, like we're kind of taking like the strangler pattern and slash like peeling an onion with our service based approach. Even when I started, we had some initial very infra centric services that were already being pulled out. And so at the places where it was easy to extract in the services we did, so like a lot of the unstructured data handling is already pulled out. A lot of our user extensible metadata is pulled out. A lot of the clients have had a bunch of their logic pulled out. So it's just kind of like we're pulling off the top and bottom of this monolith slowly but surely what's left in the middle, which is what we've been we're starting to tack or we've been tackling for a year ish, but it's also it's very complicated and sensitive. So it takes longer are the kind of core business logic. And so yeah, it's just kind of a long, long running process here box has a lot of stuff to do. So we go in and out of dedicating more or less resources to an effort like this. But yeah, to dive back to just for a second to the permissions policies themselves. So within any like taking a file, for example, in the grant flow, we check, hey, is there a shared link in the request context? Are you we call them collaborations, but you can think about it like an ankle. Are you collaborated on this file as a viewer as an editor as a whatever are you collaborated up the folder tree because we do this thing called waterfall permissions. And that's kind of the bulk of the grant phase is a few other steps in there that we could talk about but the less interesting. And then the revoke phase is like much more interesting like is the thing deleted. Are there legal holds are, are you at quota you can upload more of this folder if you're a quota. And that gets a lot more complicated. So yeah, so. So that's kind of the high level of authorization. I'm curious if there are places they'd love to talk about more. Or things that were unclear. I'm used to doing this of the whiteboard. It's a little bit harder just just trying with my hands. Yeah, I get it. Maybe in the future season, we can do another episode with video podcasting, which is also becoming a lot. I'm curious to learn more about kind of like where you are, what you're evolving towards right like what technology is the service oriented approach going to use for for policies how you picked it. Yeah, so, and we've, we've gotten in and out on this decision and we like reconsidered it a few times. But for our services on our floor, we're using a zackamal engine open source zackamal engine. And I don't know how you actually pronounce that x a c l. We're using Balana, which we have a few developers that are ramped up on and are able to like make local patches to in terms of like multi attribute prefetch and things like that. So we have quite a few domain services on the floor that are using that or a lot of basic policy. And as of as of right now, we're in the middle of pulling the giant PHP permissions policy over to it. And that that's got that's been an interesting discussion. I mean, zackamal is obviously pull it separating the permissions policy from the rest of the codes obviously has some really nice benefits in terms of the one thing I previously mentioned is like it's very clear where it's enforced and when it's enforced. One of the challenges in that monolith land, like I mentioned is you've got some things that need to run like the permissions itself needs to run like in sort of a God mode supervisor mode in order to fetch the data in order to see if you can even see the data. And so it creates a really clear boundary there where like the permissions policies have a lot of access internally. And, you know, we we switch over to using we call it a domain service token, but a version of our internal token that has basically indicates that you should because we want to be able to reuse the data access logic from the service to fetch the files, but without inflicting the permissions calculation on, you know, without creating an infinite loop basically. So yes, that's one super benefit is it becomes very clear, which like where the permissions are going to be enforced are going to be enforced on the endpoint, and the permissions policies themselves have broader access to the data than the normal natural colors, the endpoint. We we looked at we looked at a few things I mean obviously just like homebill policy in Java was one thing. Zachimal was another thing and we revisited as like open and off that and all those things have come out, like whether whether those would be better choices for me for me like for me having started this problem for a while. I would love it feels like some machine learned approach here that like minimizes latency of the policy by by using any features that you have available at request time either the enterprise ID user ID certain request features, being able to find like a fast path through that to the grants would be a wonderful thing. And I was super excited about Opa for a while. It ended up and it's all like I don't I haven't looked recently at Opa so I apologize if I'm not stating any of the changes in direction they made but they seem like they were really heavily focused on infrastructure security at that moment really focused on like low data scale, like almost like IP titles level like sub millisecond response times. And we, we have a lot more complicated policy issues here where, you know, we touch 60 plus tables. It's a large amount you couldn't fit it into like kind of the static small database they have like very large data sets including the files folders their collaboration relationships and things like that. And so it didn't it wasn't quite a fit and I felt like we'd kind of be a use case that they would have liked but then kind of way outside the where they were getting a lot of success and traction at that time. But yeah, we, there, there have been concerns about, you know, we were handwriting Zacamole for a while, which is obviously no developer enjoys that so we actually have a bunch of libraries written in Java that effectively represent they generate Zacamole at the other side so it's closer to like a Java like we've experienced with the IDE and things like that. But yeah, the big kind of question as we move the items permissions policies over are going to be and as as we move to microservices right like the repeated checks thing starts to become more of a concern. And so the parallel project, trying to enable graphql for some of our internal clients and so like, obviously naively that things going to try to, you know, hit the same endpoint over and over sometimes. And so the permissions, the permission recheck thing figuring out how to do either request level cash which we had for a while, or some other type of caching in order to avoid the penalties that is going to become important. And then yeah just like ensuring that the underlying underlying policy, ideally is faster than the thing we currently have hand tuned in PHP. But at least at parity with the thing that we have is important. I in my my secret hope is an architect here is, you know, the current policy is, you know, like I said hand tuned which is a polite way of saying very, very, very hard to change right like some of these policies have ordered appendices on the other parts and can move them around without potentially creating like creating a security issue. And so I would love to just get the policies into a place where they're naturally expressed the way you'd want to express them without thinking through the purse implications and then allowing the underlying platform to intelligently make decisions about evaluation order and things like that. So, yeah. Yeah, that makes sense. So I think there were a bunch of interesting nuggets there. I'm going to try to see if I got them. So, yeah, so because you already have a set of policies, when you started looking for alternatives and only one hand you took out all barn and that being kind of like more infrastructure oriented which I would tend to agree I think we go as a language allows you to express everything but like all bond and the tool said it's more thoughtful kind of like infrastructure when you think about like how they handle things like cash and unstable in general. But at the same time, you already have policies, which means that you need to keep compatibility with that and maybe you were used to expressing things in a certain way which you probably want to maintain for security reasons for like simplicity reasons and so on. So, yeah, some sacrament at least made the cut and then you were okay, we're going to go with this implementation, which is banana. And then from that point on it's about okay how do we improve things so again you got the kind of like higher level maybe the SL library in Java so that you don't have to write the technical code. You mentioned also that you're making changes internally to how the library works which might be again you're making improvements to the catching logic and so on that you kind of like have a small fork of banana that you're kind of like tuning for a box and that's kind of like what you ended up doing as part of that migration. Yep, that sounds right and we're always open. I think we're always open if like something revolutionary shows that we're always open to considering other other options here. Yeah, the authorization space is kind of like fairly effervescent right now and new things are coming up every day like I know for example AWS recently released open source at least data policy language which is kind of like yet another thing and has a few interesting benefits from like static analysis perspective so that would be an episode on that that is a bunch of new SaaS companies kind of like looking at the space open source project so yeah I imagine over the next few years you will be able to either replace or complement parts of what you're building with some of these solutions in the space. Um, yeah that would be great and if as soon as we got out like the key goal right here is getting out of sort of this thicket of permissions logic that we have like we have things that will like short circuit for perf reasons that creates like the lack of commutivity of the policies um or a lot of things that you know reject all permissions and things like that so getting into zackamal and like having the stuff naturally expressed puts us in a good place to experiment with other technologies more easily than kind of the free for all PHP code basically have. Yeah, no I get it that makes sense and I'd like to go like double click on that because you mentioned the notion of paths earlier we were talking about like all the ways in which you can share documents with folks so as I think this is a problem that that I've been thinking out for for a few years now which is when you get into granular document sharing like anyone can view or not view like specifically just one file it's not the folder it's not files but it's on like any attribute or something it's like they can either be you or edit one file when you get into performance when you get into box scale. You are a kind of like dealing with complex authorization requirements so how does authorization work for file sharing what happens when I share a file with another user and give them maybe read permissions or write permissions how is that handled internally. Yeah, so I guess is a click down in terms of detail from the previous answers but basically like if you imagine going to the the our preview page to look at a file. And it you've been collaborated to use our term what happens is you know we go and dive into that endpoint and eventually at the permissions logic and what'll happen is we we pull back. I mean the first thing we see is we try to access the file which then triggers the permissions policy the file metadata. Which triggers the permissions policy which will look through for that user. What are all the the collaboration that it's a part of. So the file metadata includes the collaborations for the file which would be the ways in which the user can interact with the file. Yeah, not not fully quite so we have two tables under the covers we have like a files table folders table and a collapse table club being short for collaboration. In the collapse table it's got the inviting user the receiving user as well as their role on that thing and basically because of the way things are done like you'd think like oh just look at the the the clubs table for that user ID is trying to access the file and the file ID. Unfortunately we are or fortunately we maintain things very normalized under the covers. So if you were collaborated at like a parent folder somewhere. We have to give you access based on the fact that you've got access at that parent folder is referred to as waterfall permissions. So basically what ends up happening is on each request where you're accessing a file or folders we pull back all the things all the roots or all the folders or all the files that you have access to. And then you know we quickly look through that list and see which and we we keep that cap. We recommend customer standard like 10,000 of those but it does require some thought on the customer side about how they want to store their permissions in a normalized way. But we scan that look to make sure you have access to that file or any of its parents and then if we do is actually rarely do you just want like we drive a lot of UI on the other side. Rarely do we just want to know can you view view the file right we want to know if you can delete it so we can know if we need to show the delete icon or not. We want to know if you can comment on it we want to know a bunch of things about what you can do with this thing. So actually in the policies we like compute and mask say largely use the same data. We compute and math what what permissions do you have on this file can you comment on it can you delete it etc. And then so so it's not that you don't have to ask the same question multiple times it's not like saying hey can the user view then can the user then can the user delete it's it's more of a hey what like from all of these tell me which ones a user can do something like that. That's basically it and then that function that I was talking about earlier can you view it all that thing does is looks to see if you have that permission amongst that array of permissions. This is something that you know we call it like bulk or batch permissions on the Balana side but that we have to do something slightly interesting here to fit into like the you know you called get we can just evaluate the get. On the Balana side but we do want to be able to bulk calculate the permission sometimes because we do return to our clients. Can it be deleted. Can it be changed etc. And so yeah yeah yeah that makes sense and how so like let's try to get off like mix things up a bit because you're saying okay yeah. You you can have kind of like all of these sides but you might have kind of like this cascading or waterfall permissions where you might have the ability to view a file not because someone shared the file with you but maybe they shared the fights patent or maybe even like grand pilot folder so this is kind of like I'm like everything is really a graph traversal problem if you look at it but again you might have multiple ways in which you read the file reach the file. And then you also have groups right like there's the notion of hey I can be part of a group and we the group has the column. So it seems that there's a bunch of state that you have to kind of like put or how go through the body see so at the body see gonna arrive at the right decision. I'm curious on like I want a couple of things one of those is like again you mentioned some recommendations for customers or like keeping things under like 10k objects for roads but I'm not sure if I followed so I'd be curious like how you limit that. I'd also like to understand a bit more how you handle the groups with these kind of like traversal denomination like broader right. Yep yep. Yes so you're correct it is a lot of data and groups is an interesting question in here. We make very extensive use of memcache and redists along the way through a lot of these calculations. So like that that saying I mentioned around like what collabs of my on and like what what groups of my and those those pull extensively from cash and that's probably one of our larger expenses. But so in terms of groups specifically so groups is a fun one for us. In my like when I joined box I you know there was no groups and I'm like this is quite great like as an enterprise software company that groups would be. At a key point of scale like giving companies your customers some of which are very large a scalable way to kind of manage access to content. We did add it a little way and I wouldn't say it's necessarily 100% where I'd like to see it. But as a yeah as of today just in terms of that specific example we're talking about we do pull all the groups that you're a member of and we pull all the collabs for the groups that you're a member of and we kind of match that all in. And there's this kind of a deep reason why you like if you if you stare at this use case and you think about it for a second and you're like this seems absolutely ludicrous to be like pulling all these things back each time like pulling all the things you have access to when you know like the destination is this one file. Obviously like a database index would solve a lot of that for you pretty rapidly. The hidden thing and a lot of this is so we have three or so and potentially more experiences one one obvious one being search where there's an implicit filter when I type like show me the documents containing my stocks. There's there's an implicit filter there right show me my stocks and all the documents that I have access to people don't think about that I have access to you know part of that sort of choose is challenging. Yeah and as part of that we basically and we can talk about the low level details of how we index that. And we also have to do that for like metadata query which is another facility we support but also like the main landing page at box is this big experience we call our all files page. And it's basically kind of a virtual view of everything you own everything you have access to and a few other things and so across these three different experiences. We basically end up having to act like look at everything you have all the roots you have access to each time anyway. And so you know it ends up being we end up just kind of relying on that facility for a lot of. Well basically all of our access control and even though if you look at any particular use case there'd probably be some way of optimizing the access there. And so yeah groups specifically and like more interestingly like you're all enterprise group which we've started supporting in certain limited cases. Yeah we we we go through might we go through we expand the group or we see which groups your member of we start from you walk out groups your member of and then figure out the collab objects on the other side of that and feed that into that giant machine. My my ideal hoe over time here like right now we don't really index the groups that have access access to a file in our search indexes and in our metadata query indexes. My hope over time is we would start doing that and then just start passing in the group IDs there and order to reduce some of those like those those cardinality challenges. Like in terms of that 10 K limit that I talked about before they basically any like or or system problem at box is like there's just this hard trade off between storing it normalize and having to deal with like an expansion and like bumping up against that 10 K limit. A folder IDs that we pass to a search index or denormalizing it and making it fast but then potentially bloating out the amount of data we have to store or like just having like an operation that was fast before like adding a operator which was like Oh of one before is now potentially an operation where you have to like reindex everything. Yeah. And then you get into depending on how long that takes. If it's a few milliseconds maybe it's fine. If it's going to be three minutes until the change is propagated you might have issues from a security perspective like different systems have different trade offs there. But where you were just constantly like which one is it in this case and thinking through all the implications of the normalization versus denormalization. Yeah, I get it. Let's go look at some kind of like all of the numbers behind this so first like for each authorization decision. What data are you using like we talked about a lot of data. Can you give us an idea and what the performance of authorization decisions typically is. Yeah, absolutely. So I mean box in general and these numbers are going to be non specific so I didn't run them by any anyone. But we have hundreds of billions of documents that we're storing like nearly an exabyte and scale of like the actual underlying content hundreds of thousands of enterprise customers. And then you know like the customer size tends to follow like standard power law where we have some that are like over 100,000 users and then a lot of like really a lot smaller ones. I think our average customer size like 30,000 users or something. And then just in terms of like, in terms of performance, it's highly variable like you can imagine based on how I described the authorization decisions. It can be very highly variable. Like for me, like when I go to my all files pages is like my initial landing page. And I'm kind of at the harder end of scale it's to just like get the basic items which is mostly just a permissions check on the items just like fetching them fetching our rebacks not that expensive. It's it's kind of close to a second, which compared to like, you know, some of those numbers we see from hope up and right goal is is probably two or three orders of magnitude higher. And then this but you know you have if you're a smaller customer or you don't have this complicated permission scheme, you're going to see lower numbers, but still probably in the hundreds of milliseconds range is probably our minimum there. But we do have like, I mean our CEO ironically, you know, he's, you know, been there for I don't know what you said 18 years has been there for the 18 years. 18 years has all that data accumulated. And so he, you know, his, his all files page experience, which he loves dearly, it can take him upwards of like three to four seconds to load, which is far from what we'd like it to be. And we obviously were working on a lot of projects to drive that number down. But at some level, some of this data that we have to process just takes time to access and run through. In terms of like, overall throughput, we're dealing with like millions of requests per second. That's it with that. That's a lot of, that's a lot of requests. And so this is a kind of like when you compare to some of the public metrics from our framework. So this has been a system that's in print for a lot of years. X, Y, J, so it's, it's, it's not an uppers to uppers compiler. So let's just put it like that. Yeah, of course. Yeah, I mean, when you're serving an end user app, and we have a really, I think we're going to maybe talk about integrations in a minute. But, you know, we have a lot of things that integrate with us, least of which are like our desktop clients, which do basically try to maintain some replica of your view on the file system on the local local system. And so, you know, we're pushing events out, they're making calls back. And so those drive a tremendous amount of load. We also have third party, like products like CASB virus scanners, things like that that are just constantly hitting the API's, ensuring that the data is protected. So, so yeah, we have a lot of things going on. And then I ran the numbers, we're doing like hundreds of thousands of permissions checks per second. As part of that, like anything that's accessing a file, there's some amount of a permission check going on in there. And so yeah, so it's quite large scale, makes it fun. It makes some things like we use, you know, as we're extracting the core stuff doing a lot of parity testing, it makes things like that feasible, like taking the old policy, the new policy, and like running them through comparing the results and sharing the same. It makes those techniques more viable and more fruitful. Yeah, then I can imagine, again, any change you make, you have to like maintain the same kind of a backwards compatibility, let's say from a result perspective, while at the same time making improvements. So it must be an exhaustive set of checks from multiple perspectives. Yes. You mentioned apps, which we discussed, we would touch on as part of the episode. And we mentioned them at the beginning, where you started working on APIs for some of your partners. I know Vox integrates with a lot of other apps, I would say, typically, like between. How does authorization work with apps? How does a typical Vox partner and a Vox application that uses Vox and depends on the Vox API work from an authorization perspective? It's largely an added layer. So the API came later after all the permissions policies firmed up. And so the way this tends to work is if you're going to build a third party integration or even a first party integration, even our own teams use this. And provision, we call a service but applications a better name. Provision application, you get your client ID, client secret. And from there, as part of that creation process, you specify what height of these apps. We do have a different mode where the apps actually become a user within a particular enterprise, and those are more limited to the enterprises. But for your standard client use case, where it's doing something for a user, you specify what of that user's underlying permissions you need. Are you just going to be, are you an admin app and you're just managing users for the enterprise? Or are you like content processing app and you need to be able to access their full tree, their full folder tree? Or are you, you know, something that only needs access to a particular file or folder? And then we send you through, I mean we obviously on the back end as part of that process, you can't just like publish that to like our application repository, it goes through a bunch of internal reviews. And then when we publish it, it's able to be used by people outside of your own kind of local enterprise. But basically what happens is, as a standard, we allow a different, different sets of the offloads, but we use O up to, and as part of that, we don't, I know there are other ways like people use like scope. And this might be the standard way, but when people use scopes, they'll like effectively pack that whole precomputed permissions process. Into the scopes and then it's just a matter of checking whether like the scope has access to the underlying file, given how dynamic our permissions are, what we actually put in the scope are more of that clients permissions on the underlying users permission. So it's kind of an allow list on the underlying permission. So it says this service when it's accessing a file can, you know, read or write all their file, or their whole folder tree. And so we basically check, hey, you accessed file 12345, we compute the permissions on it does this user even have access to file 12345. And then when it comes out through the API response handling, we check, does this application have the ability to read or write that user's files. And if yes, then we kind of will return the result or allow that that API client to make that change. I mean, this is all this is all done much more generically than how I'm talking about it. But that's the basic flow. So yeah, I get what you mean like that we actually the first episode we did of the podcast back in season one was with Victoria or touchy who worked with us and we talked about how for scenarios like these were authorization is dynamic because like resources and permissions and there's lots of them. Scopes and embedding things in access tokens is not the way to go. So you kind of like in boxes case you're saying hey, we have scopes as a higher level notion which might be hey, we are going to be managing users, but you're not going to say which users or I'm going to be reading or writing some folders. And then when you actually have to kind of like make the specific call you just make sure that the access that the user consented to kind of like encompasses the specific thing so that if you're changing a folder's name, it would be hey, can they write this folder's name. And did we actually get on access so that this token would be able to edit folder names or whatever there is. Exactly. Yeah, that makes sense. And I'm here. Where is the authorization decision made for the kind of like API decisions. Yeah, it's all done. It's all done as part of that underlying permissions policy. I believe yeah, it's all done in that underlying permissions policy. And so it basically as like the last phase it computes all the permissions like I said that you have access to like can you view can you comment can you delete, can you edit, and then as like a last step, it will check like the if the it was a get API, it's going to check do you have view, and it'll take the application scope and use that as kind of an allow list on the underlying permissions. And if the scope was like let's say manage users, but they're trying to access a file via that app, it won't have you on that underlying underlying file even if the user has access, because you didn't grant that access to that application. And it's all just kind of all happening in the same place. Okay, it makes sense. So you essentially you reduce the kind of like the pre computed set of collabs, as we're calling that that the user has on the file and then saying, Okay, this API code actually requires this collab. So going back to hey, does this scope the correct one and does the user have the kind of makes sense. Yeah, it's just sort of two phases we can you compute the user's natural permission on objects, and then we we basically filter out anything the user didn't any access the user didn't grant to the application as part of the OAuth to flow, because it as part of the OAuth to flow right we send the user to a screen that says, Are you okay, allowing this application to read or write your files or manage your users and so there's, there's part of that all slow that we have to maintain. Yeah, yeah, the off consent. Yes, exactly. Yeah. Okay, but John, it's been, it's amazing, man, we did a very deep dive on box, which is a both a complex and like large system on and I'm sure the visitors will really appreciate learning how to affect a large battle tested lonely production sass deals with authorization. It's, I really appreciate your time. Yeah, thanks. Thanks for having me. And if there are any listeners out there that are working on authorization systems that are interested in more detail, we're always open to conversation with open source developers or companies about what they're building. Excellent. Thanks for the, for the ask. And again, as John was saying, you should reach out to him if you're building something that that might be useful for box. That's it for today's episode of authorization and software. Thanks for tuning in and listening to us. If you enjoyed the show, be sure to subscribe to the podcast on your preferred platform so you'll never miss an episode. And if you have any feedback or suggestions for future episodes, feel free to reach out to us on social media. We love hearing from our listeners. Keep building secure software and we'll catch you on the next episode of authorization and software.