 Four references feels like a bit of a ragtag future to fleet because we have put this together and I don't think I've seen the entire slide deck because this is how busy and insanely crazy the conference has been for all of us. And apparently we don't have. Oh, something's happening. Something's happening. Excellent, thank you. So, with me, well I guess we should introduce ourselves. My name is Julia Krieger. I'm a senior principal software engineer with Red Hat. With me here is Ron Krieger with Boston University and part of the, and the MOC Alliance. Also, John Stumpf with Two Sigma and Tramel Hudson with Lower Labs. Sorry, we're on the last day of the conference. My brain is fried. So, the idea of this session is to kind of give people an idea of the various interconnecting pieces we're trying to achieve at a kind of a high level. And our next session, which will not be required but will be in this room, is more geared towards, go ahead. Where's the arrows here? Right here. Ah, thank you. Is more geared towards giving us a chance to talk and discuss and dig into perceptions issues and ultimately try and build some collaborations outside of our existing circles. So, with that I think we'll go ahead and start with some quick short presentations and go from there. One thing to note that during the second session we've been requested to use Chatham House Rules and I believe that the long story short is basically you're free to use the information that you receive but neither identify the person nor affiliation of the speaker who shared said information. So everything here is public and recorded but the next session, everything here is public and recorded but the next session should be for open discussions. So, okay, so this mic, we'll just hand this mic around. So, this is just for those that didn't see the keynote. We've been talking about and this will go really fast because a lot of material we'd like to sort of just present the overview of to initiate the conversation second session. So, we're saying we want this fundamental layer in the data center and John's gonna talk to that and the goal is that tenants don't have to trust the provider of the data center and the same model we're doing for the data center can actually be applied for the cloud. Where we're at right now is that we've got this sort of core platform defined and implemented with Julia basically based on OpenStack and it's being deployed right now on 40 racks while initial proof of concept being rolled out hopefully over the summer to 40 racks in our data center with the goal of extending this over time to the entire massive data center and eventually have this be sort of turned into something that people can deploy internally in different companies and organizations. It's being integrated to a number of large scale services in that data center to start being using them for extra capacity and there's a wide set of research and development going on that we're gonna be alluding to which is where we really want to work with a broader community on these efforts as well as a whole bunch of open problems that we have. There's way too many topics we wanna talk about. Basically John's gonna talk about the business needs, why we're driving this or why Two Sigma's investing in this, we'll talk through. Julia is gonna kinda cover what the provider platforms is. I'll review some of the use cases and what means to use this. Then Tram will talk through security and then quickly we'll talk about disaggregated storage, some of the automation we wanna do around this. These are sort of, if you look at what's in deployment versus what's in development versus what's in research, this kinda covers that. And with that I'll hand over to John. All right, so hi everybody. So what we're talking about here is bare metal provisioning. What we're looking to do is be able to shift compute and storage very easily between tenants. So my tenants are all internal to Two Sigma but they have different security profiles and we would like to be able to move workloads with those different security profiles around consuming varying levels of compute and storage. One of our problems is being able to assert the certain qualities of the individual machines and being able to do that in a fast way. But the main thing here is that tenants will be providing their own keys because I am a provider within Two Sigma and I am not trusted enough to see some of the tenants' information and right now those are managed in a separate way and therefore they're managed in a more bespoke way as opposed to an industrial strength way which is what my team does for the bulk of the company. So what I'm trying to do is expand my reach and my capabilities by doing this. So some of our requirements here are just to reduce the trust of computing base. We need to ensure that a server is in a known state. This is a commitment I want to be able to make to my tenants that they can inspect that server and know to their satisfaction that this box is what they're expecting to receive. And we're gonna be doing that through hardware rooted attestation. And what we want to be able to do is measure the devices, the firmware, the boot load are all non-volatile memory, but not all of this is possible today or we haven't figured out how to do it properly today. The other thing we wanna do is prevent the loss of data confidentiality. We want to encrypt the entire disk. So we don't wanna leave any spaces available to put something that I'd have to worry about. So I operate all the physical assets. One of the things that I mentioned in my talk is I have to worry when we don't follow our own procedures and a disk leaves the building that hasn't been properly wiped or a server leaves the building and it hasn't been properly wiped. These happen infrequently, but it's enough. I mean, I put people on planes to recover those devices and fly them back. It's very intrusive. And that's how serious we take data loss. We do not want that to happen. So I wanna avoid the problem altogether. I wanna make sure that a disk is entirely encrypted and if I lose it, I don't care. The cost of the device is the least that I care about whether it's a server or a disk. Just for a sec. Can you raise your hand if you attended the keynote if you saw the keynote introduction to this? Okay, good. Okay, then I know what I can move faster. Okay, thank you for that. So one of the things that we wanna do here, I'd like to be providing all of the firmware to the devices, all of the components. Now, today that's not possible. So the first thing that we're gonna be looking to do is how can I measure the firmware? How can I make sure that the firmware that is on that device is the one that I am okay with? I've tested it, I've done something with it. That's still a problem. But ultimately, we want it where the tenant can do that. So the tenant can show up and say, well, I don't like that version of firmware. I didn't test it. I need this other version so I can deliver it as long as I know that the card that you have, that's kind of where we'd like to go. So that's an even further goal and we don't know how to do that yet. So some hopeful side effects. I'd like to get rid of grub. I'd like to get rid of UEFI. Any proprietary BIOS I'd like to get rid of and then the closed source BMCs. In short, I want open hardware. And I want it from lots of people, not just one company or two companies. I want choice. And I don't want it to do it the old way. So what this mean, the tenant will reliably know what they're getting, that all the data at rest or in motion is encrypted with their own keys and all the volatile memory can be reset. That's basically where I want to go here. And then by doing this, we're defining a platform for a provider, me as the provider and a toolkit for the tenant to be able to do all of this attestation. And with that, we have to impose some requirements, standards about capabilities for the devices that would be put into the servers or the servers themselves. So this fundamental platform that we're looking to build is going to be used for any number of use cases. We want to handle a huge hardware compatibility list and I'd like to pull in any device. I mean, a network switch is just a computer with a lot of ports, right? The storage supply is just a computer with a lot of disks. Why can it apply to everything? And we eventually can get away from OEM lock-in for that short list of vendors that can support me now in an open way. I can get away from lock-in. It shouldn't be the differentiating factor. It should be table stakes, as we say. So hopefully we're gonna be moving down the path. More vendors will support open hardware. Triana's gonna talk about some of the interesting things that we've done to date and where our research is right now. And I'm looking for more participation. We've already made a connection with storage. I saw some great presentations here that overlap with the work that we're doing on confidential computing with containers. Looks all very interesting. So I'm very encouraged. So what we're hoping is that we'll have more discussion here. That's what we're really looking for, to partner. This is all open source. Everything that we're doing is gonna be published. That's what we want. So thank you. All right, so we're just gonna blast through these and hopefully, I don't think there'll really be much of a chance for break given the number of topics. Julia may not know she was gonna do the next part, but this is actually Maine's movie to start off with. And then I just want you to kind of handle the questions because you're gonna figure out how to do this. Hi, my name is Tuman Chen and I'm a principal software engineer at Red Hat. I've been working on ESI for just about three years and today I'm going to quickly talk a little about the history of ESI and the work the team has done to get to where it is right now. So in the beginning, ESI was an idea, a bare metal cloud designed to be used by multiple tenants. It was developed within the MSE through two projects. Hill, which stands for hardware isolation layer and BMI, which stands for bare metal imaging. These projects created custom scripts and tooling that allowed users to do a variety of tasks. These nodes create private networks, attach nodes to networks and various node operations, control the power state, do image-based provisioning, boot from a volume and more. These projects did what they had to do very well, but every piece of functionality had to be developed from scratch. This made the code difficult to extend and to maintain. And at some point, we took a step back and came to the realization. That realization was that all the functionality that ESI required could be fulfilled through the use of OpenStack, almost. OpenStack allowed users to create private networks with Neutron. It allowed users to attach nodes to networks through Ironic and Neutron configured with the ML2 networking Ansible driver. It allowed users to control the power state, do image-based provisioning and boot from a volume with Ironic, Glance and Cinder. It allowed almost everything we needed except for leasing the node. At the time, Ironic had no concept of multi-tenancy and could only be used by admin users who had full API access to every node in Ironic's inventory. So what could we do? Was there a way to bridge the gap? The answer was a definite yes. The upstream Ironic community is one of the most welcoming ones I've had the pleasure of working with. We approached them with the possibility of extending Ironic with node multi-tenancy and they were extremely receptive. They helped guide us through the process of implementing the feature of Stream. The ESI team developed the code but only with considerable assistance and advice from the community who helped us write a spec, discussed the finer points of design during weekly RAC meetings and spent time giving us feedback on our pull requests. When the feature was merged, the ESI team had exactly what it was looking for. Nodes now had owner lessy fields which could be set to an OpenStack project ID. Each role had different levels of API access configurable through the use of a custom policy file. For example, a policy file might say that both owners and lessies would be able to control the power state of a node that they owned or leased but only owners will be able to edit their nodes while lessies will be blocked from doing so. Custom policy files were needed because by default, Ironic still wanted to restrict API access to admins in order to maintain backwards compatibility. But not too long after Ironic node multi-tenancy was merged, upstream Ironic started to plan the implementation of secure RBAC through an effort led by Julio Krieger. She updated the Ironic code which while also giving a heads up to the ESI team so that we could give feedback and review the changes to make sure they won't conflict with our requirements. As a result, recent releases of Ironic have same defaults that allow appropriate API access to owners and lessies. And of course, that access remains configurable. That took care of the needed upstream OpenStack development. But as the ESI team tested the changes in our development environments, we realized that there was space for additional tooling. First, there's Python ESI client. We discovered that there were commonly used workflows, for example, attaching a node to a network or booting from a volume that required the use of multiple OpenStack CLI commands in the process that could be unwieldy and error prone. Python ESI client extends the OpenStack CLI and condenses these workflows into one line commands. We also developed ESI Leap, a leasing service designed to be used on top of OpenStack. Manually assigning and unassigning lessies can be time consuming. So ESI Leap allows owners to offer up their free nodes for a given time period. Lessies can see these offers and create the lease for a duration within that time period. When the lease begins, ESI Leap will automatically set the nodes lessie. When it ends, it will clear that field, removing the lessies access to the node. Then there's the ESI functional test. These can be run to confirm that an ESI installation has the correct configuration and will work as expected. We're using these to test the ESI deployment within the mass open cloud. Now it's great that ESI allows users to run OpenStack commands, but can they actually use these nodes to do what they want? We explored this question by testing a few use cases. First, we went through a few kinds of provisioning. OpenStack provides tools for both image-based provisioning and boot-from-volume through Mail Smith and Cinder and they worked as expected for lease nodes. But interestingly, the more important use case was provisioning through external provisioning services. That's because many interested parties, for example, the Netto HPC cluster already have their own provisioning systems and have no desire to switch. So we verified that lease nodes could be provisioned in such a way. It worked as expected. All we needed to do was attach the node to the external provisioning network and then let the external provisioning service do its thing. Our next test was a little more complicated, installing OpenShift on the lease node. For this, we used the OpenShift-assisted installer, which uses a GUI-based wizard to make the process simple. The whole experience was very simple. We simply used the OpenSack CLI to put our nodes onto a public network that could talk to the OpenShift-assisted installer, used as ironic to configure them to use an image that the wizard created for us and then boot them up. The OpenShift-assisted installer took care of everything else. Great. We also did some key lime attestation testing. We created a private network and put some lease nodes onto that network. Then we installed key lime on one node and used key lime commands to test the others. Again, straightforward and easy. We've documented all the steps needed to duplicate these integrations on the ESI website. But generally, what we found is that the only ESI-specific actions the user really needs to do is attach a lease node to the desired network and boot the node appropriately. After that, things just kind of work as expected. We've recently deployed a production instance of ESI in the MOC. It's past our functional tests and we're about to start usage testing for the initial set of users. On the MOC side, we'll be testing that students can create volumes in Cinder and use them with these nodes allowing them to persist their work across leases. Cloud Lab is looking for something different. They have scripts that add nodes into their testback cluster and we'll be helping them extend those scripts to work with ESI. So far, it looks like the scripts just need to be able to control the node's power state and attach them to the correct network and that should be simple enough. And that's where ESI is right now. You can find further information on our website and you can reach us on the OFTCRC network in the MOC channel. Julie, did you want to add anything to that? Just, that's good. I really have nothing to add but I guess we'll do questions at the end. Okay. If that works. So that's the core platform that actually allocates the infrastructures all built on OpenStack and it's sort of fairly production today. So the question is, what are the use cases of this? What do you layer on top of it? Turns out that most of the things we're doing in the data center kind of work out of the box. If you take HPC environment that runs on Ethernet, of course, it doesn't change the networking once it's set up. All these environments tend to be resilient to nodes failing, going out of the inventory, coming back so manual operations could take them in or out of the inventory, just disable the nodes in it and so using Slurm on top of it or OpenShift on top of it or even OpenStack on top of it actually sort of works reasonably well if you do things in kind of a manual way. The challenges are basically having the inventory for anything that thinks it has the inventory of which computers it has. You have to basically duplicate that inventory into both environments and say it's not there or it is there. That's the kind of thing that would evolve over time. There's challenges like, for example, OpenStack will by default spread things across all the possible computers, spread VMs so you want to pack things or migrate them so there's manual operations to make it efficient if you want high elasticity, but it sort of works. The major focus is actually the OpenCloud testbed. So what the OpenCloud testbed is it uses CloudLab software. CloudLab is used by thousands of researchers across the United States to do deterministic repeatable experiments. The OpenCloud testbed is the newest layer, the newest service that's staying this up. And what it basically does is it's a mechanism to deploy researchers' experiments which include a whole virtual machine, a whole, sorry, image of an operating system. It rapidly moves it onto that computer, installs it, and then gets out of the way after it's provisioned it, setting up networks both within the data center and crossing multiple data centers. So slices that are dedicated with dedicated bandwidth. So researchers can do deterministic experiments. Possibly people from this community have also used it. So it's being used by a number of open source projects as well. So the OSTU is the newest national testbed and part of the goals of it to briefly state is to deploy CloudLab as we've done inside our data center but deployed next to the MOC, so production service so they have access to users and so they can find the cloud metadata. So that's something we've talked about providing both the open source community and to researchers and they can expose experimental services on top of the infrastructure to a real user community. The domain users from the MOC Alliance. They can access cloud data sets and further by doing what we just used they can use ESI to actually move infrastructure to expand the testbed or reduce it. If you remember the keynote, we talked about how the day before the symposium on operating system principles, the infrastructure was 100% used and the next day was 0% used. So knowing what the deadlines are, we can shift computers back and forth between these different environments. So right now what we have is we have a CloudLab installation on about 200 servers I think and separately we're standing up ESI. The goal is to move all the infrastructure including the 40 racks I talked about for other use cases to being on top of ESI and we're building basically drivers in. So CloudLab has drivers to deal with different hardware much like Ironic does. It has drivers to deal with a whole bunch of different switches to do what it needs to it much like Neutron does. It's not, you know, it's something specific. It's not something this community would want to use but it is incredibly powerful for the use case it has. So we're adding new drivers so now ESI becomes basically a driver that's added in for when it wants to do operations on hardware and a driver when it wants to do operations on network manager and given that once we move manually infrastructure between ESI and I'll talk later on about automating that and the test or say from production use cases to the test bed it'll just be able to use that infrastructure and deploy what it wants on top of it. All right, I think trouble next, security. Should we answer quick questions? So as I mentioned in the keynote yesterday we have a lot of things working in terms of security with doing attestation and making sure things are working correctly on the machines. So here's a quick demo of just booting a single node. So we have an attestation server running in the little window and we're gonna boot, in this case it's a virtual node because the real ones take minutes to boot. So this is fetching a reproducibly built Linux kernel and an ID that are then signed with the platform key via Pixi. This tool then will generate a TPM quote and send that quote over to the attestation server which replies with a disk encryption key that is unsealed and then handed off to the chain loader that then passes it to the Windows boot loader. So after a second or two we have Windows 10 which has actually been booted from Linux which is kind of a neat capability to be able to boot into a non-Linux but otherwise EFI operating system. So the pieces there that I mentioned we have secure boot working to establish the static root of trust. We have a reproducibly built Linux environment that can talk to the TPM. And in this case it's talking to the TPM through the UEFI protocols. It's talking then over the network to a tenant-controlled remote attestation server that allows the tenant to decide if the system's in a good state. And this is somewhat agnostic to what the actual attestation mechanism is. The demo video used a safe boot attest but KeyLime is also a good one. The actual transport of the keys back to the client are hardware sealed so that they can only be decrypted or they can only be unsealed by that TPM on that boot. So this prevents replay attacks. This makes it a lot harder for someone, even a local admin to try to gain access to these secrets. And then as I mentioned, when we boot into the real OS we can key exec into a Linux or with the new capability we can chain load into a UEFI operating system like Windows 10 passing the BitLocker keys through the RAM disk. So we can go into that in some more detail. There's a lot of, we're glossing over a lot of details in how that chain load process works and the attestation works but I'm happy to answer questions in the second session on that. Okay. So again, this blazingly fast but just to repeat it, Tremor actually has Linux K-executing Windows which blows my mind. All right, so at a fundamental level what all this will work efficiently it requires disaggregated storage or some form of making sure that the disk doesn't have to be scrubbed between things. So you could scrub your disk that spend 15 hours we're trying to move computers rapidly between different domains. We can, and easy alternative is to actually if you have a highly orchestrated workload you can use the local disk for ephemeral storage, encrypt it, not care about it and if it's handed over to something else there's a whole bunch of use cases to solve that. The thing we've been kind of focused on a lot is disaggregating the storage, making sure that the storage doesn't have to be local. The, what we did for the initial research work is we basically exposed iSCSI, we booted nodes from iSCSI and under the covers we tied that to a big distributed self cluster so that all the blocks were actually coming from this RBD in the self cluster. And that worked, it actually kind of was nice. So if you take a normal provisioning system, you provision, you get the image onto the system, you first have to boot into that then transfer the image over, my kids always call during a meeting, my transfer the image over and then you actually go through a second post time and then you actually boot from the disk and so it's about 25 minutes for an installation of normal environment. With the, that's fine, first time maybe, but if you now reallocate and then allocate it back, you go through the exact same process. If, with the diskless provisioning, like I just talked about over iSCSI, it, you basically, once you've established this is the image for that machine and then you allocate some other person, allocate back, you just have to boot from the distributed virtual disk. So that's actually a nice story, dramatic reduction in re-provisioning time. Turns out that a lot of this is actually the post time of the computer. So with the help of trauma, we actually built our own BIOS for a few machines, was actually quite difficult to install and for those machines it was a Linux based BIOS, it post time went from three minutes I guess it was to 30 seconds and so you're getting actually times that are equivalent to virtual times for re-provisioning it. And remember we want to dramatically move fast to different workloads. So this is an example of booting a unikernel on one of those things and you can actually get the installation of a whole trusted environment for a particular, doesn't actually in this case have to do chain booting off the disk, we just blast the image over there and we can actually get a computer in a trusted environment running for a particular workload at bare metal in about 35 seconds. So these are kind of the use cases we'd love to push to in some extreme cases. Oh, the other point to mention about this BIOS that we based on Linux is reproducibly built so the client can look at the source code and then see is this the image after our station that corresponds to what they wanted on there. So no matter what the previous tenant did. The problem with all this is that virtual disk performance that was fine when we were doing this research because you would basically use a whole bunch of different disks on the outside versus a local disk. But there's a problem here, writes are quite slow. So this is actually showing as I spun up more and more VMs on a physical node doing a random write operation to this cluster of 56 spindles using RBD over SAF. And what we're seeing on the vertical axis is what the utilization of each of the drives are and what we're seeing on the horizontal axis is IOPS. How many write operations we're actually succeeding getting for this random write workload. So it's an extremely extreme workload. And the reason that this is happening is that with RBD we actually have the client doing a write goes to an OSD that gets replicated, each of them write the metadata and then we finally acknowledge the operation of the client and doesn't continue till that all happens or at least it's blocked the next sync operation. So what you're getting is basically this write amplification of about 12 times for every write and that's distributed across disks. That was actually okay for the most case when we had disks both on the client and the server. We're moving into a world now where we have SSDs and we can't compete in a distributed environment with locally attached SSDs. So we've been trying to work a lot on how do we actually get disaggregated storage that's as efficient as a locally attached SSD. The approach that we've come up with, this is still research, this is where we wanna go, is log-structured virtual devices just published last month in Eurosis. Basically it exploits the locally attached SSD as a read cache and as a small log that gets sent out on top of S3 storage. This is the current performance, so it basically goes up to about 50,000 IOPS utilizing the disks for the same number of VMs, in this case VMs at only about 10% utilization. And to briefly state why this works is you're logging information to the local disk, then aggregate into big operations that are sent out over S3 storage that are erasure coding. So using efficient storage at the back end with large write operations rather than local and you're using the local SSD as this cache. So the end performance is comparing to say B cache which is a local cache. We end up sort of completing a bit faster so a random write experiment's about 280 megabytes versus 200 megabytes for a local cache, B cache, which is a known cache. And that's basically because the logging is faster, we're doing the write sequentially, combining metadata and data. With LSVD we're writing out to S3 as aggressive as you can so we get the bandwidth to that size about 173 megabytes per second. So it finishes a few seconds after the write random write experiment does. With B cache it's writing out random writes afterwards so it is only able to sustain about 15 megabytes per second for the same reason. So you're talking this enormous amount of time during which any crash would actually lose you your data whereas in the case of LSVD you never lose your data. So we have an approach now that we're excited about that preserves advantages of disaggregated storage while enabling, you could actually have copy today using S3 for DR and HA, full consistency if you don't lose the SSD, prefix consistency if you do and this is now being upstream so it's kind of exciting. So this is a path for disaggregated storage and that was what we thought we had till we got to this conference. And Muli, would you mind coming up? So actually this is part of the reason we came here is it turns out there's actually a solution that this community has for an alternative that's viable right now. I still think my solution is better but this actually works right now and can be deployed right now. Thank you very much, Oran. So it was a real pleasure to run into Oran and the team here. We started talking, we like to talk and it turns out that we came up with pretty much the same vision independently a few years back except we did it as a product, as a startup. I'm Muli Benihuda. I'm one of the founders and the chief scientist of Lightbits and what Lightbits provides is software-defined disaggregated storage. And we provide this using NVME over TCP for those of you who are familiar with NVME over fabrics. NVME over TCP is NVME over fabrics just over TCP AP networks. It's a standard. It's upstream in Linux. There's support in all operating systems and hypervisors including VMware. We started by doing exactly the same sorts of things as LSVD, batching small writes into large sequential writes and writing them to NVME SSDs. And we extended this with NVME over TCP clustering so that you are protected from both drive failures and stored server failures. And then we added all sorts of enterprise data services including secure multi-tenancy, RBAC, role-based access control, encryption and so on. And this is in production today at multiple cloud providers and private clouds. And we're really looking forward to integrating this into the MOC and seeing what we can do together. So this was awesome because we actually have an answer in the short term. Jules remind me we're already in the break session so very briefly we have different workloads. We'd like to combine them together even though they're different clusters using bare metal to actually that lets computers get the same SLA. How do we incent people to do that? So my graduate students would be talked on these different groups. Turns out the SLIM cluster that we have ends up at HTC it's using the computers at 90% utilization has infinite bandwidth. They're willing to kind of put all their hardware into common pool if most of the time they get more computers than they've actually paid for. The open stock environment we have is interactive. Frankly it's willing to, we're willing to put our computers into this if when we need a burst which is once in a while most of the time we underutilize our hardware because we have to make sure that we can handle the bursts and so we're putting infrastructure in there most of the time just to handle it when we hit peak utilization. There's a whole bunch of other projects you can see most of this is empty because I knew we didn't have time to talk about it but in other words there's incentives for each of these people. How do we actually get them to agree to share an environment? This was a kind of cool one it turns out we were talking to local Air Force Base. They run their gear one tenth of one percent utilization to deal with national emergencies. Those have never happened. I'm very willing to share hardware with them and get it subsidized if it turns out that hopefully if there's ever any national emergency I'm willing to give them hardware anyhow. So we built this marketplace model where basically the different environments have trading agents, the HPC environment is willing to give up servers whenever somebody's willing to pay more for them and so there's basically a internal monetary model to allow people to exchange servers to drive the economics and that's kind of the idea of that and I'd like to, the last thing sort of ties in I think we're now in the past the break already. Okay so what we knew Tremble was anyhow going to, basically this can go on forever his part of it now. So these are all the open issues on security. So unfortunately security does end up taking a lot of our discussions because we could just say what we know is broken is everything but there are lots of different attacks that we want to be concerned about and we went over a bunch of them very briefly the other day but John's really concerned about leaking confidential information either to other tenants, covert channels and side channels are some of the newer ones that people are starting to get really worried about and we are also worried about physical attacks so though as I mentioned those aren't really in scope yet. So some of the areas that we are looking into where we are actively researching have to do with basically the devices in the system especially as we start to move away from the X86 that we think we have a pretty good handle on right now but even on the X86 side we still need a better story about how do we handle updates that can the tenants install their own updates? Can the tenants install their own firmware? As Ron mentioned we've done some work with custom firmware for some of these machines and we've managed to reduce the boot times dramatically. We also think we have a better security story sometimes just due to the smaller TCB that comes with the custom firmware. There are lots of other devices in the system that we are concerned about that are currently not participating in the attestation. We really would like for them to have a secure boot sort of capability so that they can attest to the X86, the X86 or the main CPU can then attest, you know include that in its attestation. And of course devices that are DMA capable were a lot more concerned about than ones that are not but pretty much anything on any of the buses has potential to mess with the system. Local disks and persistent memory also have firmware that we're concerned about largely here for these sort of covert channel possibilities where Slack space could be used for extra trading data. And also we need to think about how do we reestablish keys? That's through the data rather than through the firmware. Yeah, yeah. The BMC is a particular area of research for me. They are way too powerful. They're connected to everything on the system and I'm really concerned about their trustworthiness and they have a huge amount of complexity that you can see in the size of the red fish and GTMF specs that really make me question sometimes if we need all of it compared to again, a smaller TCB built with open source code that we can actually validate and audit. FPGAs, lots of unknowns. I'm not sure really how we secure those at this point. And then the other area that I'm very excited about is I mean John mentioned is the confidential computing and enclaves on paper. It looks like it solves all of our problems of trust but we really need to do a better job of characterizing them against lots of attacks. Previous enclaves have fallen to side channels and we're basically now putting all of our eggs into the security of the platform support processors. So this is a concern. So these I think are restating a lot of what we've sort of said but we also are looking at how can we ensure that all of these systems that we're building are work with existing standards that can be supported in deployment that the various OS vendors aren't going to break underneath us and it's a lot of open areas of research and I'd love to talk more about that with anybody who has thoughts on it. So is it possible to stop the recording now or has it already stopped? Sorry? We should probably just wrap up. Okay, so the assumption we're now in the second session? I think, can I just jump? So just to repeat where we have this platform, it works to solve some use cases, they're really valuable. We think it opens the door to all these other things to enabling hardware that's more secure that we can now do more and more of these things. We're nowhere near that but we're solving some use cases and we want to expand beyond that and that's where we want the community. For those that have attended, thank you. Sorry for running a bit over. We're going to switch gears now to kind of an interactive discussion which will not be recorded for obvious reasons and we'll be honoring Chatham House rules for this at the request of some of the speakers and attendees. So can you confirm, can you turn it off? I'm going to go confirm now. Okay, so what we're looking for, well first, ask whatever questions you have and we can provide whatever answers we can and details about what we're doing. But part of what we're looking for is, are there things that you can suggest to us or if there's work you'd like to partner on or whatever. That's what we can do.