 Hi, everyone. We're so excited to be here at the first ever Sillium con and talk to you about how we use Sillium at Bloomberg to build our data sandbox I'm Anza peckie. I'm an engineering team lead out in our SF office, and I'm joined today by my colleague I'm sure Tej. Can you hear me on the mic? Awesome. Okay, cool Well, I think I just want to start by being like extremely Transparent about what our motivation is for this talk You know this week at cube con. There's gonna be a lot of you know really interesting and wonderful talks about Critical production issues things that went wrong Facing bottlenecks at scale and you know we have our fair share of Technical challenges that we're gonna be talking about today, but we also wanted this talk to be a bit different We know that a lot of you that are in the audience here or you know, maybe on the stream Don't even use Sillium in production yet. I mean you maybe just like walking in like off of the Convention floor like from cube con itself And if that's one of you the number one question that's going to be on your mind is just is Sillium Right for me So what Ann and I are gonna do today is showcase a practical application of Sillium Just straight out of the box or integrated with other technologies and then explain How we're using it at Bloomberg to create valuable business products And then with that The high-level overview of the talk today is gonna be first what we built and importantly How do we decide to build that? The second thing is how we built it and then finally we're gonna be talking about some of the things that we learned During the process and then maybe it'll be it's gonna be helpful for you as well With that I will hand it over to Ann who's gonna talk about what we built Awesome So we're from bequant, which is a group at Bloomberg that builds a quant analytics platform that runs in the public cloud We use CNCF technologies like Kubernetes helm terraform and of course Sillium, which is why we're here today And our platform enables quant analysts to be able to build financial research and production workflows using Bloomberg data and there's a little asterisk there and You know, we want to be able to have our clients be able to do this build their workflows in a secure way So street has just going to talk a little bit about how we make this possible So we need to Know create something called like a data sandbox And all that means is a way to launch jupyter workloads that have a broader access to data But a more limited scope on how they can use that which includes things like data flow and data distribution So the question is how do we enforce that users can't easily export protected data from these bequant workloads? And there's gonna be like few factors here. There's a track show element. There's a commercial element But what we're interested in is the technical component in which we fortify our network using Sillium So I want to pause right now because that was a lot of words and you know I like to kind of have a simple slide to break up all the text But I wanted to remind y'all of the point of the talk today which is just is Sillium right for me Actually want to go a bit farther. I want all of y'all to see it like hear the words kind of out of your mouth So on three, I want you to say is Sillium right for me. So one two three Is Sillium right for me? awesome So I mean, I think the important thing to note though is the answer could be no It's true that there's that saying that no one ever gets fired for buying IBM and engineering hours are really expensive And to use a new technology is always risky But with that risk comes like new opportunity and it's that opportunity which you can use to justify Using a new technology and then you know putting in the effort and the cost at Bloomberg We decided that it was worth it to you know use Sillium We did a comprehensive review of what our customers use cases look like as well as looked at the type of risks that we Could face and then we use that to construct a threat model and With this threat model we were able to factor in things like our contractual elements We were able to limit the scope in some areas So we were able to focus on automated data egress as well as customer resource access So what we were looking for is a very lightweight solution for a limited in scope problem and that is what we use to justify Sillium for our use case Now, you know, it's entirely possible that your company's customers are not going to be our customers but at the same time their needs may be very similar and With that you can justify using Sillium as well because it's a very powerful technology So with that I'm going to hand it over to Ann who's going to talk about exactly how we built this data sandbox Awesome. Thank you So we use Sillium as our container networking interface as some of the other folks Who have spoken already today do as well and it's really beneficial You know bequant runs in the public cloud and so we're able to replace the cloud provider C&I with Sillium And that's one of the reasons why we chose Sillium is because it is a full C&I replacement But a little bit of story time. So, you know, we want to kind of control the Access out of bequant workloads Because it's running in the public cloud We don't want access to the full internet from those workloads because that provides some level of risk for our clients So we wanted Host-based policies to be able to control what our clients are able to access from these bequant workloads We are able to use Sillium to do this And then when it came time to build our data sandbox this provided another kind of layer on top of that to be able to Build the a true data sandbox around this We restrict cluster network access to specific ports and host names required for Jupiter as well So another one of the reasons why we're able to use Sillium to build the data sandbox So we use L7 network policies Specifically are forbidding anything that's exporting data from this environment And so this is preventing data exfiltration to trusted entities, right? All this is restful It's probably the first of many pieces of YAML that you'll see throughout the the course of the week But we filter the workloads that are running with the sandbox mode So it's a label on our resources and then are able to enforce these policies on traffic coming out of these workloads We chose L7 for a few reasons One it allows us to make decisions based on Information at the application layer rather than at the network or the transport layer The performance cost of doing this ends up being negligible because of ebep f and its magic The violation of L7 rules doesn't result in a packet drop Which means that we're able to pick up when there's an error and that traffic is not able to to leave the node So it's not the case for L3 and L4 policies And then finally with L7 policies and pond pod annotations That traffic is proxy through an envoy instance Which means that L7 traffic targeted by the policies depends on the availability of the psyllium agent itself Simple, right easy one piece of YAML psyllium network policy. That's it. We built this valuable business future It's not quite that easy TLS termination and origination make it a little bit more complicated So our bequat workloads need to trust the traffic that comes back from psyllium Which means that essentially what we have to do is we have to provide private certs and keys That we generate on the cluster we store them as secrets And we take the certificates generated by the roots CA and append our certificates on that and Then the psyllium agent is actually pulling these secrets on the cluster So these private certs and keys to be able to do the TLS termination and origination And we want to be able to automate the generation of roti in rotation of the certs that we're generating on the cluster So we use helm to do this. There was some lift to set this up But in the end it allowed us to have a fully comprehensive solution that was secure I'll pass it over to Shreeshast to talk about the flow of outgoing traffic from our bequat workloads So I'm a visual learner. I know Anne is one too So let's just like run through an example of a user running a bequat workload This is going to be our architecture right here. You can see the user space and the kernel space And then you can see that psyllium is going to be deployed as a daemon set with the psyllium agents as well as a psyllium on Biproxy You know, let's make this horizontal to make it even simpler a user in their workflow is going to be accessing protected data in the sandbox Now the user is going to either maliciously or not Try to export that protected data to the S3 bucket slash prod slash forbidden Now the outbound request is going to go to the Linux kernel at this point the psyllium agent would have compiled eBPF programs To say that you know if the traffic is going to go outward Send the traffic to this port. We're just serving a psyllium envoy proxy Now the psyllium agent is going to read the network policy and configure the psyllium envoy proxy using the XDS API and Then the psyllium envoy proxy is going to be the one that's going to make the decision Do the TLS inspection? To accept the traffic or reject it but in this case It's not going to allow the outgoing traffic to the S3 bucket slash prod slash forbidden And the user is going to fail to export the data from the bequan sandbox Now this by itself is already super useful for us But now Anne is going to talk about how we decided to take it a bit further Yeah, so we built on the capabilities that we provided with this data sandbox To build intermediary data storage So we found that some of our clients had workflows that required flexible storage to do a variety of things from Storing intermediate models to handling cash results between runs of jobs to handle handling Processing for really large files or stashing generated signals So what we built on top of this functionality was what we call sandbox storage Which is dedicated as three buckets that can be accessed from within the bequan sandbox and that access comes from exceptions in our psyllium policy and through integration with cloud identity Federation And we'll talk a little bit more about how that's actually set up So a little bit of context here each of our bequant workloads has a corresponding OIDC token that helps secure access to data This is issued by our Bloomberg identity provider And we use sts in AWS or this cloud identity Federation to enforce access control via I am so specifically we're using the assume roll with web identity Which allows us to use IODC compliant? Tokens to fetch temporary limited privilege access tokens that users can use to then access their S3 or other AWS resources So again, we're controlling access to resources via I am We're able to leverage session tags to know which I am policy to enforce and as I had mentioned before we have a means of Labeling our workloads to know which ones are running in our data sandbox And we're able to use that information actually in our token as well Specifically in the section tags to be able to know which I am policy to enforce when we are Accessing data from or accessing data storage from the bequan sandbox So the user can access this intermediary storage Corresponding to the sandbox as enforced by both the I am and silly and policies So if we put that all together and take a look at what this looks like overall You can see that initial call to use our OIDC token with assume roll with web identity to be able to fetch those AWS limited Privilege credentials for the I am user with that policy And then when a user sends a request to write to that data storage that traffic is inspected by psyllium we have You may have seen if you were really closely inspecting that yaml from a few slides back That we have an exception to allow access to these particular resources in our psyllium policy And so once it passes the check from psyllium then we go on to the AWS side and are granted access to those resources via the I am policy that the user has and you can write to your storage So there's a lot going on behind the scenes But what the user ends up seeing is actually quite simple if the user is going to try to export protected data To one of their sandbox buckets it'll work But if the user tries to export that same protected data to any other s3 bucket They're gonna hit a four or three forbidden From their perspective. It's that simple So I mean we just talked a little bit about how we built it But now I think it's time to go on to some of the things that we were reminded of in the process of building this Which were really useful for us to remember and then hopefully you know sharing these will be useful for y'all as well to figure out If psyllium is right for you Well, I think the first thing to remember is that you can start small You don't have to build the perfect product all at once and it's actually quite valuable to Analyze your customers workflows and then see what works for them and then offer more functionality incrementally That way you can offer more value to them and then they'll probably pay more for it Now the first way that we did this is through data replication across multiple regions. This is for the sake of disaster recovery Another way is an approachable user experience now we use Jupiter workflows primarily and then Jupiter is well known within the industry For being very easy to use and then we want to make sure that our user experience is up to that same Bar and that goes for anything Starting from the point that the user is going to log on to the platform to the point where they launch a workload And even within the workload The ability to access, you know, very handy Environment variables so that they don't have to think about well, which bucket do I have access to? Another thing that you know, we should mention is that at Bloomberg security is going to be our number one priority What this means is all the way from clusters are going to be segregated by customer We're going to default to a default deny policy Whenever we can and then and went to a lot of details earlier about how we manage cert rotation To avoid some of the intricacies there and then we think that this is all worth it for the customer And then they end up seeing the value of this So I think another thing that we should talk about is comparisons with alternatives and The first thing to say is shout out to an Istio day. That's happening later as well I know that we have a sister team that's presenting a talk there and When we were evaluating options, we decided to go with Celium not only because of its host space routing policies But just because of how performant it was with eBPF like having it in the kernel itself as a game-changer And I'm super excited to see a lot more projects starting to implement this like Calico Falco, and then there's that seems like there's more week by week Another thing to mention is the access to a lot of professional support I think Celium is really lucky to have a really robust market out there of Experts who you could potentially you know pay to do things like troubleshooting And then to help debug which you know, we've really bad if it is from And then you know, I think it's also important to mention I know that Thomas earlier had the picture of like not being able to see anything that's no storm But observability is really important to us and then we Definitely have gotten a lot of use out of having Hubble on our clusters And I'd love to hand it over to Anne who can talk more about that Yeah, awesome so Definitely love Hubble provides a lot of observability Which is very helpful not only to be able to have that application and network visibility as we're looking into What's happening on our clients clusters But we've also found that this is a really beneficial learning tool for people on our team that are new to Celium or new to the team To be able to see the network flows and see what's happening with the traffic as it's coming in and out of our workloads Has been a really powerful way to be able to visualize how all of this works together So I'll show you a little bit of what that looks like a lot of you are probably familiar with With the Hubble UI, but we have our service map up here And you can see information source and destination identity The ports right the what happened to particular packets So very helpful to be able to do that and then you can actually trace right that in more detail and see those network flows To be able to understand you can see shreet has showed two examples of Successful and failed requests to export data from the sandbox and you can actually see those Here on the screen as well. So Definitely has been a very helpful tool for learning as well as you know being able to Debug any client issues that may come up Yeah, I really like Hubble. I was just using it last week But I wish that something that it had more of is the ability to filter better And I was just like, you know manually typing in All of the labels that I wanted but you know, that's That's something that we have the opportunity to change So I think that I wanted to make one of these last slides just a reminder To everyone to just be good open source citizens At Bloomberg we rely on a really healthy open source Ecosystem and then there's a lot of ways to support open source all of which we believe in I mean There's things like sharing use cases talking about pain points. I mean, that's a lot of what these talks are gonna be about this week but you know also a bit more actively in a direct form you can contribute to developer docs or you can open up issues and PRs I know that you know our teammates definitely have I think some open issues right now Which I'd love to discuss, but we probably shouldn't But you know in a different format you can also help out monetarily by opening up a FOS contributor fund and at you know our company we found a great amount of success for having this fund for free and open source software That just is a easy way to democratize just giving funds to Projects that are used by everyone on a quarterly basis, which makes it really interactive So with that I want to hand it over to Anne who'll close us out So we're gonna end this with a little bit of a call of act call to action for all of you here And the first one is think about where psyllium could fit into your needs figure out if it is a good solution for you If you're here, there's a good chance it is But think about you know how you can use that to build the functionality that you need and Another point that That we've made throughout this talk right is don't be afraid to start small You know we introduced psyllium as our C&I initially and then we're able to build a lot of really valuable functionality on top of it for our data sandbox and For our sandbox storage functionality So don't be afraid to start small and add later and keep keep building Which I think is a kind of a good attitude that already exists within the open-source community as well And then finally contribute back to the open-source community find what works for you I know I'm working on my my first draft PR for psyllium itself. So that's been exciting for me but definitely You know I really excited To engage with with all of you and be a part of this community So thank you all for your time and attention today And we have time for a few questions First of all, thank you that you have an amazing way of presenting first of all I was wondering if you use something like psyllium first of all It's always about security as you've stated but as you Streamline the The yeah, how the data flows and where the packets go. Does it also have positive impacts on the performance? Right, I think that definitely influenced our decision to use psyllium as opposed to an alternative at the point where we made the decision we were you know really We struck by how Non-intrusive the psyllium model was in terms of being able to do what we needed to do host-based routing You know as opposed like setting up sidecars and things like that Another question, maybe So one of your earlier slides you showed the flow. I think I'm sorry. I think I don't think the mic is working Testing one two better. Thank you In one of your earlier slides you showed that the flow goes from Jupiter to an EBPF program and then that gets forward Yeah, you can go back to the slide Before this okay the linear one this one this EBPF step Did you guys develop your own EBPF program or would did that come out of the box with psyllium? Yeah, we did not need to develop our own EBPF program. We just use the psyllium network policy specifically an L7 policy that only allowed gets heads and I think options and then therefore, you know put a default and not everything else and Then you know, it's also important to note that This was for like trusted entities with like say S3 You know there's potential for for untrusted entities like you don't do some some trickery with these like HP methods So for our purposes like that was sufficient and I think also One of the benefits of integrating with psyllium is that because we're able to use a lot of this Functionality out of the box. It was a pretty easy transition aside from the part that we mentioned with some of the the TLS termination origination and then the certificate manager that we needed to build there, but overall More of a seamless integration What prevents the user from exfiltrating the data through the web page of notepad Observing Yeah, so if I think I heard the question correctly you said what events are the user what prevents so user gets access by running job and The job sends data back to users browser So what what prevents that way for data leak from I didn't hear the last So I think it's what prevents a user from running a job to start exporting the data, right? Yeah, so okay. Yeah, I can answer that one Yeah, so kind of as we mentioned before there are three parts to our data sandbox The last of which and the part that's probably most relevant here is the technical piece, right? So no, you can't start writing to random as three buckets You can only write to the ones that are provided as part of the the sandbox storage offering that we have but those other two pieces are the The commercial piece and the contractual piece, right? So with our clients, we do have an agreement that you know You're not supposed to be exporting that That's less exciting. So we won't we won't talk about that Yeah, what's also helpful is we offer other solutions. Yeah, if they do want to do that So it's just really knowing what your customer wants to do with the product and then trying to address it wherever you can