 Thank you very much for coming to my session. In this session, I will talk about the RECHD CPI and how to use Bosch on top of bare metal machines. So the agenda today is I am going to give a very brief overview of Bosch and the RECHD CPI and describe the use case that is supposed to satisfy. And then I will describe how it works and show a video demo of that. So before we begin, I would like to say a couple of words about myself. So my name is Victor Fong. I come from the EMC Cloud Foundry Dojo, which is located in Cambridge, Massachusetts. So we have two teams, right? Our first team is in San Francisco, and they work very closely with Pivotal in the Pivotal office on Howard Street. In the past, they have contributed into UAA, Go Router, Cloud Controller, CLI, Bosch, Bosch Infrastructure, and very recently, Diego Persistence. And then we have a team in Cambridge as well. And each engineer in this Cambridge team has been through the six weeks Dojo program in San Francisco. And after we learn everything, we want to replicate what we have in San Francisco into Cambridge. And the reason why we want to do that is because we want to improve Cloud Foundry as a platform. And the way that we want to do it is by having a lot of smart people to come contribute into Cloud Foundry. So that's why this Dojo was set up. We have a blog site for the EMC Dojo. It's called dojoblog.emc.com. And feel free to check it out. We have a lot of exciting news and also a lot of very useful documentation and tutorial and examples that you can use. And lastly, my Twitter account is Victor K. Fong. Feel free to follow me on Twitter if you're interested in the work that I will be doing in the future. And also after this talk, I will trade out the locations to the documentation and also the tutorial and the examples that I have used throughout this talk. So feel free to follow me and then I'll follow me for that. So as I said before, the Rack HD CPI is developed in the Cambridge Dojo. So after going through the Dojo programming in Pivotode, the most important thing that we have learned is the development methodology, right? Which is pair programming, test-driven development and also continuous integration. And we try to follow all of those guidelines when we develop the Rack HD CPI. We believe that two-head is better than one. So we are constantly doing pair programming in our environment where two engineers are sitting together in the same workstations, trying to solve the same problem at the same time. And through constant rotation, each engineer inside our team would get contacts to an entire project, not only just a small piece of contacts where in a traditional team they might be in, right? So, and then we practice test-driven development where it's very different from the traditional development-driven testing. So we would first write the test to the expected behavior, execute a test to get a rest date. But at that point, since we have not changed the behavior, like the rest date is a good failure, right? So once we get a good failure, we would change the behavior by changing implementation, and then we would get a green state at that point. And this cycle is happening every day throughout the course of the development. And at the end, we will get a very comprehensive set of test width, right? And our code base will be very high in test coverage. And we use all of those tests in our continuous integration pipeline. So as you can see, there are five stage in our pipeline. So the first stage is the unit test and the integration test, and that executes very quickly. And the second stage is the lifecycle test, which is redeploying the CPI directly without a botched tier. And in our test, we try to invoke the CPI functionalities directly and see if we can get back to expected behavior. After that finish in like two minutes, then we will deploy the botched director on the third stage. In this stage, we get the latest release of botched director and we get the latest release of the rack HD CPI and we combine that together into deployment. And after that stage is finished, we have a functional rack HD director that we can execute the Bosch acceptance test on. So the Bosch acceptance test has two parts, well, it has two stem cells, right? So it has the open two stem cell and also central stem cell. That's why the fourth stage is broken down into two parts. And then lastly, after the acceptance test is finished, we can actually deploy that by tagging a version in our GitHub account so that our user can use our release with confidence that the code have been tested thoroughly. I would not pretend to be an expert of Bosch. I mean, the whole Bosch team is sitting right there. So Maria is probably laughing at me right now. Anyway, but I will say a few words for those people who are not familiar with Bosch. So Bosch is a tool to automate deployment and do health monitoring and upgrading and scaling and cleaning up in a production environment, right? So in a production environment, if any of the machine goes down or if software is malfunctioning, then Bosch would detect that and spawn up a new machine and deploy software on that machine so that your environment is highly available, right? So it's also a very good tool for CI-CD because you can just upload a new release to the Bosch director and then the Bosch director would take down machine one at a time to deploy your new software for you, right? And Bosch has actually no idea what infrastructure is talking to. All of that is abstracted out by a cloud provider interface, also known as CPI. So the CPI is responsible for communicating with the infrastructure tier. And in this case, that brings us into the RECHD CPI, which talks directly to the bare metal machines. So it will allow the user to have full use of the bare metal machine without the virtualization tier, right? So it provides the same capability of Bosch on bare metal machines, including deployment, health monitoring, upgrading, scaling, and cleaning up. And it's also a great tool to do CI-CD for bare metal machines. So there are two main use cases for this. The first use case is of course to execute existing Bosch releases on bare metal, right? So what does that do for us? It will eliminate the virtualization tier. So that's definitely one last thing for you guys to buy if you have to pay for the license for virtualization. And it's also one last thing to deploy and to maintain, to scale, to upgrade. So in a production environment, it will save you time and also effort. So, and also your software will now be running directly on physical hardware instead of virtualized hardware, meaning that it should get additional performance because it's using real hardware. And at the same time, your machine would not have to run the virtualization tier, meaning that it can save in memory and also in computing power for those. And the second use case is actually a more interesting use case in my mind. So in the past, Bosch has been very good with everything above the virtualization tier. So it would create VMs and deploy like OS and software on top of the VM, right? But what if now we can use Bosch to automate the deployment for the underlying tier as well? So that will bring us into almost fully automated data center. The networking solutions are already software defined. The notion of software defined data center has come out a couple of years ago and existing solutions already exist out in the market for those software, right? And once we have software, it's totally possible to automate all of that. So imagine if you can have a data center and where you can just plug in physical machines into power and basic networking. And after that, there's a magic button for you to click and they will deploy all the networking structure for you and set up all the routing and firewalls. And then the next step, of course, is to set up software defined storage, right? So there are existing software defined storage out there, including EMC Isilon and ScaleIO, right? So all of those, so you will have a magic button to deploy all of that into every single physical node in your data center. And if one of those nodes goes down, Bosch is going to spawn up a new machine for you, right, and then the software defined storage would then be responsible to replicate itself across all the nodes and no balance itself so that all the data is persistence and also have a redundancy nature, right? And then last but not least, virtualization is definitely software defined, right? So imagine if you have a ESXi or OpenStack and you can hit the easy button and have Bosch deploy all of those for you onto a lot of physical servers. And if a machine goes down in the middle of the night, you will not even have to be awake to solve that problem, right? Bosch is going to create a new machine, install the exact same software for you, and then, you know, vibrating. Anyway, and then if you have to upgrade any of those software, then, you know, Bosch can be the CI CD2 for you as well, right, so let's say if you have to upgrade ScaleIO, then, you know, Bosch would take down a machine, install ScaleIO for you and bring it back up, right? So I would do it one at a time so that your data server will be highly available while you are doing the CI CD. So that's a major benefit. So that's all very good, right? And RECHD CPI will open a door to that automated data center. But how does it really work? So in the background, the RECHD CPI use of technology called RECHD is open source technology created by EMC code. In the past, it has been a EMC product called on REC and it has just been open source in the beginning of this year and you can, you know, obtain the release and the source code from GitHub slash RECHD. So RECHD provides functionality for automating, you know, hardware management and orchestration. And it is done so by having like a server, client server model over RESTful API and the RECHD CPI would use this API to tell RECHD what to do and it would, and then it would just carry out our task through customizable workflow. So there are a couple of example workflows, right? For example, you can try to have RECHD upgrade your firmware for your existing bare metal machines or it can install a security patches and it can provision new machines for you, right? So to explain what it looks like, we have a very simple environment here with just a simple network switch, right? In the beginning, we will have a functional RECHD server and free bare metal machines attached to them. So as you turn on the machines, the machines will sign off with PixiSignal, right? And RECHD would pick up that signal and store the metadata of those nodes into its database. And after that, RECHD would then be fully functional and ready for your input. So at this point, a user can come in and tell RECHD to install Ubuntu or Node 1 and RECHD would be able to do that, right? So how does Bosch come in in this case? So let's say if we have the same environment with RECHD server and physical nodes. As the node sign off PixiSignal, the RECHD server will pick that up and store that into its database and now the user can come in and say Bosch release, right? And that would create Bosch director somewhere. You can totally install the Bosch director inside one of the nodes, but let's just say that Bosch director is installed somewhere else, but it's attached to the same network, right? So at this point, the user can upload stem cells. And a stem cell will be uploaded to the Bosch director and the CPI will also be responsible for uploading the stem cell into the RECHD server so that the image is also persisted in the RECHD server. And now the user can upload a release. For example, if they want to install Redis into one of the computers, they can upload the Redis release, right? And then at this point, the user will be able to do a Bosch deploy. So the Bosch director would then tell the RECHD server to install the image on one of the machines, right? And as soon as the image is booted up, the Bosch agent running inside the image will attempt to communicate with the Bosch director. And the Bosch director at this point would tell it to install Redis on the computer. So after this, the deployment is finished and the Redis server is functional on the bare metal machine, right? So at runtime, what happened if that bare metal machine goes down for whatever reason, whether it's a hardware failure or a software failure, right? Bosch would detect that because it keeps a constant heartbeat with the agent and if that heartbeat goes away, then Bosch is going to assume that the machine is done and then it would try to recreate new machine. This resurrection cycle is created by telling RECHD to install the image on one of the nodes again. And then as soon as the image is booted up, the Bosch agent inside that image is going to communicate the Bosch director again and the Bosch director is going to tell it to install Redis, right? So what happens if the user wants to upgrade a release? So the user can just load a new release to the Bosch director and then the Bosch director will tell RECHD to deprivation and then the RECHD, actually Bosch deployed and then the RECHD server would then remove everything on the node and then the Bosch director would again tell RECHD server to install the new stem cell into the node and then after that, the new version of the release will be installed on the node. So this is how Bosch can achieve CICD in a bare metal environment. So that's all very good, right? But how do we run Cloud Foundry in a bare metal environment? So there are three choice here and I'm going to describe each one of them, right? So the first choice is to deploy each component of Cloud Foundry into its own bare metal machine, right? So meaning that, you know, the Go router would have its own machine, the Cloud controller would have its own machine, you know, the UAA would have its own machine, right? But then that would not be making a very good use of your hardware because your release might not have enough traffic or require the necessary consuming power for your machine, right? So in that case, it's really a race of hardware if you want to install each component into a computer. And another option, the second option is to co-locate a lot of those components into one machine and you can certainly do that. But then in that case, you would lose out on the resource segregation offered to you by the virtualization tier, right? So that, you know, if one component is consuming a lot of resource, all the other components running in the physical machine would be starving. And also if one of those components fails, then Bosch would redeploy the whole machine for you. So it's not very efficient in that case. So the best case is to do a hybrid environment with a virtualization environment and also bare metal environment. It's because the computing unit in Cloud Foundry is also managing containers, right? So whether it's DEA Runner or the Deagle Cells, they are just managing a container environment for you. And if we were to run those components in a virtualization environment, it's two tiers of virtualization, right? You would first have a virtualization tier provided by infrastructure as a service, and then the Deagle Cells or the DEA Runner itself would generate the container runtime for you. So that's not very efficient, right? So the best case scenario is that we can put those computing units on bare metal machine, but at the same time have the rest of the component in a virtualized environment. So this is what I have set out to do, right? So in the beginning, we have two networks and we have three nodes connected to the private network. The network on the bottom is the public network, right? So at this point, we have the RACXD server ready and we can deploy a vSphere boss director. And this vSphere boss director can be used to deploy a Cloud Foundry deployment in your vSphere environment, right? So at this point, the Cloud Foundry should be fully functional and you can push applications to this. But then, you know, the containers will be managed in a virtualized machine inside a vSphere environment. So at this point, we can create a RACXD boss director and this boss director should be able to talk to the RACXD server because they're all sitting in the same network, right? And after that, the boss director will be able to deploy runners into the bare metal machines. And since they're on the same network, they will be able to communicate with the Cloud Foundry and then Cloud Foundry will use the computing power from those runners to execute the Cloud Foundry applications that you want to run, right? So this is our environment. It's just a very tiny box. So we call this the orange box and it has 10 Intel nodes on it. Internally also have a switch. So each of those 10 Intel nodes can be used as a stand-alone bare metal machine. It's not very powerful, but it's, you know, something to test with, right? So just to recap, the RACXD CPI is an open source project and it's a combined effort between the CF Foundation and Pivotal and also EMC. In fact, Katie Miles flew over to Cambridge for two weeks to help us get started. And Dimitri, the PM of Bosch is actually the PM of this project as well. And this project is open sourced. You can find on github.com, Cloud Foundry Incubator under Bosch RACXD CPI release. And the purpose of this CPI is to bridge Bosch with bare metal machines by using RACXD, right? So, and it also supports the Ubuntu and CentOS stem cell. It provides CICD for bare metal machines. And I think one day we would have a fully automated data center by using Bosch. And what's coming soon is that my team is also working on a project called Diego Persistence. It would allow Cloud Foundry applications to talk to a persistence tier. I think Ted and Paul has already talked about that in the previous talk. And also, Brian Gallagher will be showing a live demo of this in his keynote tomorrow. So please watch out for that. So with that, let's move on to a live demo. Not a live demo, video demo. I really wanna show a live demo, but then my environment, the orange box is actually stuck behind the EMC firewall. So all I can do is just record a video, I guess. So before we begin, we have two nodes. I only turn on two nodes in the orange box. And these two nodes communicating with RACXD through the AMT protocol. And the status of both of those nodes currently available, right? Meaning that they're ready to be used. So at this point, I already have a Cloud Foundry environment installed in my vSphere. So I am just going to target to the vSphere environment and show you guys what I have. Just entering username and password at this point. And then the command is Bosch VMS. And that should show me all the deployments. So the first one is the RACXD Bosch Director. It's actually also running in the vSphere environment. Right, and then the second deployment is the Cloud Foundry deployment. Look at how simple it is, right? It doesn't even have a runner. And the last one is a concourse that I run the pipeline in. So let's not talk about that, but this is the Cloud Foundry environment. So now I am going to Bosch target to my RACXD Bosch Director, which is also running on vSphere as you saw. Right, so I am able to log into it. And at this point, I want to do a Bosch VMS to show you guys that I have nothing running in that environment. And then I move over to my manifest and I change the runner instance number to one. And then now I can do a Bosch deploy on that RACXD Bosch Director. Actually, I have to set the deployment file first and then I can do a Bosch deploy. All right, so Bosch is smart enough to know that I am changing the number of the instance from zero to one and just asking me for confirm. So here it is. So now at this time, the CPI is already talking to the RACXD server and the RACXD server is running a reserved node workflow on the first node. So the reserved workflow would just act as like, it would tell the other users that this node is being taken. So you can see that the status has been changed to reserve. And now it's running the provision workflow, meaning that the stem cell is being deployed to the node and the CID is given by the Bosch Director to uniquely identify the node. So after that, it's just deploying right now and then it should be coming back up to, so the agent at this point should be walking and talking to the Bosch Director and then installing the software. So at this point, the runner should be fully functional. All right, so the next part is to do a CF push. I have a very simple SNCC game written in HTML5. It's just using the engine ecstatic build pack. And if you want to push your own application, you can feel free to combine to the EMC booth and it will help you push your first CF application if you have not done so in the past. So at this point, it's just creating the container. So the container is now up and running and it's running in the RACXD environment. Just copying the link. So at this point, the container has been created in, I mean, has been created in the orange box and now I'm ready to scale this application out into five. So I just want to take a look at the runtime environment. Right, so you can see that all of those five instances are now running in the orange box. But I really want to see what's the limit of my hardware. So I want to scale out to more instances. So I tried 30 at this point and we can see that it's doing something while it's starting the containers. But it seems like 30 might not be enough to see the limit of my hardware. So I think I'm going to scale out to more. So now I scale out to 40 and I think finally we're going to see some failure. Right, so one note in the orange box would be powerful enough to run 31 instances of this application. So now I can go into my manifest and change the number of instances from one to two and do a Bosch deploy again. And then your Bosch is smart enough to know that I have changed from one to two. Right, and then it's just going to repeat the whole thing by running the reserved node workflow and to change the status of the node in the RACXD environment. And then it's doing provision now which is putting the stem cell on top of the physical machine. And after that, Bosch is going to give it a CID to Unity Identify it. And then after the machine is booted up with the new image, the Bosch agent would be communicating with the Bosch director to install the new software, which is the runner. So at this point it's installing the job. So at this point, the second runner should also be up and running. And I should be able to look at the status and scale out to more instance, I guess. All right, so if I do CF app to look at the status, all the jobs that were failing before should now be running. So one crashed, but the rest are running. And then now I can scale it out to more instance. So 50, for example. And then looking at the status, right. So it might not be enough to test out the limitation of my hardware. And I just want to see if the game is still functional, right? So at this point, it's still functional. So now I scale it out to actually looking at the status again, see that the job that crashed last time is already resurrected by Cloud Foundry. And now I'm ready to scale it out to more instance, I think, like 70, maybe? Yep. Yeah, so now at this point, I can come to a conclusion that each one of my node can take care of 31 jobs. And with two jobs combined, I actually have 32 jobs running at the same time. And the rest is just not being started because I don't have enough computing power to use them. And anyway, so that is the end of my talk. I understand it's lunchtime already and you guys must be hungry. So feel free to take off. Thank you very much for coming to my session.