 Okay, so welcome everybody to the MIF, a BitService MIF Buster talk. My name is Simon Moser. I'm the PM for the Equipation Project called BitService and the goal of this talk is really to introduce everybody here in the room and in the community about what the BitService is, what we intend to do, why we do it and all these kind of things. So we have a little agenda that we put together and we basically divided it up into three pieces. The first piece is we're going to explain to you what the BitService is, why do we think we need it and the second block will be something that what is it that we actually did in the last couple of months, what are the challenges that we faced, what were the lessons that we actually learned while conducting the project and then last but not least we're going to explain to you why this might or might not be important for you, so how does it affect you, how would it help you, what are we planning on doing next and so on and so forth and we'll conclude with a Q&A obviously. So let's get started on the BitService. So what is what is the BitService? So first and foremost the BitService is a true community incubation project scoped around BitS. Right now we are running as probably one of the very few teams in the CloudFarmry community that is that is really truly equally divided by members from IBM or by employees from IBM and employees from Pivotal. When I say it's scoped around BitS, what I really mean by that is what our BitS, when I'm talking about this, what is it that I'm talking about? I'm talking about things like application artifacts, I'm talking about compiled droplets, I'm talking about build packs, packages and then also we are talking about all of the caching and other stuff that's today in the CloudController related to these kind of artifacts. So in order to make that a little bit more transparent to what that means, let us explain to you first where does CloudFarmry use BitS today and we're going to use the example of pushing an app to set the context for everybody. Hey my name is Steve and I work for Pivotal. These are my colleagues from IBM and as Simon just said pushing an app is like one particular use case of a CF, a CloudFarmry workflow where bits really matter or blobs matter and might be worth like looking behind the curtain on a high level like what's happening when you do a CF push, when you push your application and yeah I don't want to bore you too much so I kept it like focused on bits here so there are a couple of pieces missing in this like sequence diagram but basically the first thing that happens when you type CF push your CLI which is like that thread on the left side in the sequence diagram that says CF, your CLI will make an HCP post request to an endpoint called v2apps and that endpoint is served by what we call the CloudController or CC it's like this middle component in the diagram and that's a pretty important and central component in a CloudFarmry deployment. It's basically the component that serves all of the API endpoints so whenever you interact with your CF environment you're most likely going through the CloudController right so it serves up the API it also maintains the data model that resembles your particular CF environment so all the apps, the spaces, the orgs, the services, the service bindings, the SatellitePP all of that is maintained and stored by the CloudController so it does quite a bit anyway let's get back to that post so the post happens and the CloudController creates a bunch of like database entries amongst others there will be a row in a table called apps and that row resembles your app right and CloudController is going to assign a globally unique ID or grid to your app and it's going to pass that back to CLI and the CLI now has a way to address that up going forward so the next big piece that happens with regards to bits is all about what we call resource matching and before I tell you what that is you need to know that the CloudController keeps like a global cache of all application artifacts which means like all the files that your apps consist of and it keeps such a cache across all orgs and spaces where all these files live so basically that means like every app that has been pushed to your CloudFoundry environment all these files that these apps consist of they live in this cache right they're like unique entries for each file and why do we have such a cache well like any other cache out there it prevents us from doing things twice or multiple times in this case we want to prevent ourselves from having to upload the same file over and over again right and you can think of like having an organization with multiple developers or like thousands even developers and they all either push the same app or they push apps that rely on common files you don't want to upload these files over and over again that's why we have that cache and in order to use that cache the CLI has to have a way to identify what files are currently in the cache and what files are missing because the missing ones it has to upload obviously right and that process is called resource matching and to initiate the resource matching the CLI is going to make a put request so that's that second arrow that goes from the cf component to cc here makes a put request to this v2 resource match endpoint and it provides a list of fingerprints so there will be a fingerprint for each file that your app consists of and you can think of fingerprints like some hashes that basically identify the bits that your file consists of so an md5 hash or something like that right and that list is being passed on to the cloud controller and the cloud controller then will check with the third component that we call blob store whether or not a certain file is in the cache and it will use that fingerprint to identify that file and I will tell you a little bit more about the blob store in another slide coming up but we will basically check for each file so we make a hat request to the blob store and the blob store tells us yeah 200 I have the file or 404 I don't know that file and for the files that I'm currently missing the cloud controller is going to assemble a list of missing fingerprints and passes that on or back on to the CLI and now the CLI knows what files need to be uploaded so we'll take all these files we'll zip them up and we'll upload them to the cloud controller in the second put request here and as part of that put request it also sends along like the global list of fingerprints so now the cloud controller knows what files the app consists of and it knows what are the new files and what are the files that it already has in the cache so it gets the files it already has cached from the blob store and it stores the new files it just got to the blob store because we want to reuse them in future requests right and at the end of that process at the end of the two loops it sends back a 201 to the CLI which now can go on do whatever needs to do right and in an asynchronous way using a background job it will actually take all of these files that your app consists of and assemble what we call a package and that package is then also being uploaded to the blob store and why do we do that why why do we keep that cache and then also a package well that package is actually the piece that resembles your app that piece is then later being downloaded by either the DEA or Diego when you want to stage and start your app because we need those files and we need to compile them using the build pack or build them to come up with a troplet that we then can run so that's basically the flow so now you have like an idea of like one particular example of where we use bits and this new bit service deals with these bits but before I tell you more about the bit service let's take a look at what this blob store component on the right side actually means what that is and that highly depends on your deployment of cloud foundry right so you might either have a particular boss shop a VM in your deployment where you store these files and you store them on a local disk there that's what we mean with local disk here so you store them on a VM in your deployment while that is visible and some people do that it's maybe not the most scalable solution out there right and that's why more often you will see people use external object stores something like AWS S3 or OpenStack Swift and there are many more options and the reason we can support and then provide you all these options for the object stores is because we use this little library called fog FOG fog cloud I don't know who came up with that but actually that is kind of like an abstraction layer that abstracts the APIs for all these blob stores and that's why we can offer that right that's why we can give you support for S3 versus OpenStack I think Mark later my colleague from IBM will tell you a little bit more about fog but right now it's important to know that the blob stores it's most likely something external all right so now I told you how we use app and what the flow looks like and where we store them you might still wonder what is this bit service then because that's like what we do nowadays right without the bit service well at this core this bit service thing is a new component in your deployment it's a new boss job or you might want to look at it as a new service right that's why it's called bit service and it encapsulates all the bits related functionalities in one piece in one new component and it will do things like you know dealing with the with the uploads and the downloads of your app bits your app resources right it will do the resource matching I told you about it will handle packages droplets build packs everything that basically needs to be uploaded and downloaded all the time and we want to encapsulate that into a new service and to help you like form a picture of that thing it's probably also good to know what this thing is not so it's not a competitor to S3 it's not a competitor to OpenStack Swift we are not building a new object store here right rather we are building something that abstracts and I told you about this fog thing so why do we need another abstraction layer here well this fog thing first of all it's a library it's not like a service by itself and also it tries fog tries to be very generic what we want though is like we want something that's highly use case specific to our CF use case cloud founder use case and we want to want this component to be streamlined and highly you know performant for that stuff and that's why we want to have this new service right so we are not trying to build a new object store and we are not trying to be generic here and give you an API that you can use for your app or whatever so that's that new service and why do we need it I mean that stuff has been working for a couple of years now like you can push apps now and everything works fine right you might ask like why do we need this new component that's complexity well at the core of the answer for that question is like that the cloud control is a bit of a monolith nowadays it does a lot of things and I told you before like it keeps track of like all the entities in your CF world so it keeps track of the data model it serves the API it handles the bits and it does a lot of other things and in the context of bits and blobs the problem with that is it's mostly around concurrency and scalability so in order to understand that you have to know that the cloud control itself is a Ruby app and every request that comes in will be handled by a separate thread and that thread comes out of a thread pool so it's a predefined pool and you cannot throw it I mean you have like X threads and that's it right use them and use them wisely the problem with these bits related requests is that they're potentially long running because you can imagine like uploading big files downloading big files or like lots of files that might take a while right and during that time you're basically locking out other threats or other requests that that you just cannot handle because you're out of threats during that time and the only way right now to scale or to scale out of that is like scale cloud controller increase the instances the number of instances that you have for cloud controller so to increase the throughput for the cloud controller API as a whole we thought about like hey if you can take all these like potentially long running requests and deal with them on a separate component then we have like some leverage there and we can scale just that one particular component if you have a have a bottleneck there and we can increase the overall throughput so that's all about scalability and concurrency that's really the main point here but there are also obviously a couple of other advantages one of the M is like the cloud controller code is kind of complex because it does a lot of things by splitting this out into a new service having a separate code base we hope to you know maintain that code a little bit easier like increase maintainability and also potentially allow like parallel development tracks sorry on on these two code bases and I think with that I will hand over to Mark and he will tell you a little bit more about you know how we got to where we are today and where we are today so now I'm going to take you now on how we approached the bit service and the challenges we faced while doing that so right now the cloud controller handles five resources three are entity related like droplets packages and build packs and the other two are caches like the app it's cache which was mentioned and also the build packs cache we just picked one and started implementing it in a in a separate service a bit service implementing all the rest verbs like put post get and delete it's not a full rest api it's just these services the bits handling in cloud foundry needs we put that since it's implemented in ruby behind a web server the nginx which is doing most of the raw bits handling in the flow and as we had this up and running after that we went into the cloud controller and basically changed it instead of going to its own blobster implementation now allow it to talk to the bit service instead and do all this bits handling through this new api we've implemented after we've we had done that for one resource we just repeated that for the next resource until we had done that for all five resources then we added more brass like we started with local file system and added support for s3 added support for open stack and on the red on the right hand side you can see the part of our pipeline which is really testing the bit service against all three storage implementations in the separate lanes it's just a different release we built there and tested against the real thing and which is quite of nice having that also in a in a small component where you can do this actual testing and we did start with the most common use case in these days like the v2 api of the cloud controller and the deas as compute implementation and then we moved on to do the same thing for Diego where there's some internal end points differ and then we also covered in the past weeks v3 which was kind of interesting at some points because it's not completely done yet but we got it covered so when we look at the bits flow and how the bits flow in the example we had above from the cf push today the client uploads bits to the cc which is what you see on the top and if you have your storage backend in a production deployment in s3 cc will relay all the bits to the actual blob store upload that again to s3 and when that's done it will let the cli know well we're done and if you look at the bottom of the chart for tomorrow we will simply remove cc out of that flow so the cli or any other internal client like the da or jago will talk directly to the bit service which can be scaled as Steve said independently and if you have a large deployment with one million customers on on it and you have a lot of uploads and downloads and everything going on you can just scale these and don't need to scale your cloud controller for the actual bits handling the only reason why the cc is still in that picture because after the upload is done we'll still have to let the cc know well the file is there that's where you can find it and right now with the development we're doing we're kind of in the middle which is temporary where the cc and the bit service is in so what we have right now is like the cc still receives all the uploads and then relaying that to the bit service and then the bit service really relaying that to the storage implementation um and the blob store that's kind of making sure we're really covering all the passes but that will go away we're not adding something in the flow in between it's just for the end states that we can bypass that so challenges we faced um I think the biggest challenge was um fork or is fork because um on the one hand it's the library which makes a lot of things when dealing with the storage back ends easier but on the other hand it has a lot of implementation for different storage providers and that implementation how you can configure that right now is leaked outside um in the deployment manifest and you can for your storage implementation put in whatever you want there so for us as a team when we simply want to answer the question what storage back end implementations do we need to cover we can't answer that question today because everyone can pick what he wants or the question um what features of those storage back ends do we need to support or do we need to cover in our tests we can't answer that because today everyone can use whatever is there in terms of knob and configuration and this missing abstraction what cloud foundry as itself supports that's a real um challenge for testing or for providing um um feature equality and that's I guess that's what's supposed to go away then another thing was like there is for blobs not really a carved out API so some end points are visible on on the cloud controller and some other things for the bits handling which now get visible on the bit service are not really available as API on on the cloud controller for instance if you delete an app it will go and delete also the package the droplet but that's nothing which is outside surface on the API today that's only in the in the cloud controller code and this hidden API will become um surfaced with a with a bit service and also one thing which was quite interesting there is that all of the five resources um they're handled differently like the key generation for the resources some are only shards some all some are shards of other entities and appended a stack name or other things so everything is is is different there there's no clear structure um today um another real challenge was while we were doing that also the cloud control the copy team was continuing to work and for instance they replaced the local blob store nfs implementation um with web dove which on the one hand was really cool thing what they did there because they also cleaned up some code and introduced some extraction like blob centers and things like that but we also needed to catch up with all um the web dove changes and poured our code forward to that so that was challenging so working on the same thing from two different angles and I think the last but uh also biggest challenge was in the state we're right now we don't want to break um any contracts or on any any other components we just want to keep it working so we're not doing any changes to um the cli or jago or anything to to make it more efficient or to clean things up but that may may come as soon as we're we're done with a bit service and it is integrated in the cloud controller we really try to avoid any breaking changes to existing components so far and with that I'll hand it back to Simon to take you through the rest okay so um let me try to do the last part of the presentation telling you what did we learn right so there are basically two things that that we learned so the first one is kind of like a cultural aspect which is uh which is something that we learned from working together and and the interesting piece the most interesting piece about this is is uh company cultures are quite different and it in reality being probably one of the one of the projects where this we try to do this for the first time for real we had a lot of friction in the beginning about adjusting our working modes to to get going it that took a while so everybody who would like to it's not impossible to overcome but it really you need to you need to work through a bunch of things to adopt the the style of working and and get everything up and running the second really interesting lesson that we learned is since we're located in Germany and in the UK we are basically do remote pairing quite a lot of the time so we have most of the time someone from Germany is pairing with someone in the UK and of course there are days where people are pairing locally but but most of the time we actually do remote pairing it's a high degree and it actually can work so whoever has been asking whether this is a feasible thing it it actually is it's I didn't think it in the beginning but it worked out not so bad and and the third thing is I mean given that we are in Europe and the large part of the Cloud Foundry team is sitting here in San Francisco we always have to bridge like nine hours time difference which is making any synchronization it's required sometimes really hard but that that worked out as well so coming to the technical side of things I think the main lesson that we learned is you can split the cc into microservices I think that's that's the thing to take away from from from the from this effort or from this work that we've been doing it is hard though it's not it's not it's not easy because first and foremost if you try to do an effort like we did you have to each of the resources works differently and and particularly around the v2 api the functionality is really distributed across many places because a lot of that stuff has been a little bit organically grown and you have to rip out the things where you don't think you want to rip out things and we had a discussion in one of the retros that that said if we would have been if we would have done that on v3 it would have probably been easier because it would have been a much cleaner rest model but the matter of fact we needed to do it on v2 so we had to go the hard we had to learn that the hard way so let me quickly explain to you how does the bit service now affect you and how does it help you it's kind of like the slide that probably many of you have been waiting for so let's start that if you are a cloud foundry operator so you're running your own cloud foundry instance today how would the bit service help you so the first thing is it would help you because you would be able to independently scale the cloud controller and the bit service for these types of operations and thereby be able to you know operate your cloud foundry in a different way the problem that you're going to have is that there's going to be one more or and more VMs that you need to take care of you need to monitor them and so on and so forth so that's what you buy by getting this additional flexibility and for everybody out there who's using anything else than s3 or Swift as the actual back end please come to me and talk to me the reason why I say this is we are in the process of maybe removing fog from the bit service configuration but we don't want to break anybody so we would like to have s3 or Swift as the as the supported back ends and we might want to be what want to add one or two others but please talk to me if you if you haven't if you have using anything else than s3 or Swift as your as your back end if you are an application developer you're hopefully going to see the existence of a bit service by the cloud controller getting more responsive because we're going to offload all the time consuming bits operations into its own service and that will result in in more efficient handling of the bits which hopefully you'll see as faster push times faster uploads those that sort of thing um and if you are a cloud foundry developer so you're working on any other the cloud foundry components um you are finally getting a clean api uh to code against all your bit service operations so when you want to upload download the package uh thinking of Diego thinking of um any other components uh that's something that you that you get out of this what are the next things what are the things that we're planning to do next so the first uh and immediate next step now is we're going to release the bit service uh and hopefully make it a default in the CF release um so we are working on currently the incubation is working on a private fork um but we are close very close to merging that fork back into the into the um into the CF release itself um we have to do a little bit more things uh like a little bit more operational statistics and and a bunch of uh chores on housekeeping uh before we can before we finally think we're happy but but that that's also one of the things that we want to add uh we'd like to become independent of fork like I just said um which might affect one or the other of you so please come and see me if you are affected by this um then we would like to implement the more efficient resource matching there have been two spikes or two prototypes being done one for java applications and one for for um for other applications and we have a bunch of additional thoughts on how we can make resource matching more efficient so that's certainly something that will give additional performance gains um and we'd like to to tackle that as well um then we would like to probably reimplement the bit service in golang um just because of also performance reasons and there was one request already by the community asking whether we could add additional functionalities like backing up the whole blob store in one shot so that's certainly something to think about um and with that I'm going to go over to the q and a um and maybe some of you or one of you has any other ideas to talk about the presentation will be downloadable and it has a bunch of links to the project if you want to take a look at it so you want to you can look at the uh at the cei pipeline you are able to look at tracker and at git um if you want to take a look at the code with that thank you very much for your attention um and the floor is open for question if there are any no we haven't so the short answer is no we haven't um and the reason is because today a docker image is transparent to the cloud controller so what would be the use case when you want to expose it directly I mean if you if you compile it into if you compile it into a droplet at the end of the day or something we could think about doing something like that it's a good idea more questions uh we have been hoping that we can do most of it through redirects um so that the that the cli will just go to the cc and the cc will send a redirect um but that might not be uh the case in every uh that might not be the case for all cases so yes we are I started discussing with discussions with ts i'm changing the cli curious why uh oh it is not okay uh i'm curious why bits was implemented as a new vm versus a cf system app scalability well it's all platform components um it lives in Bosch and is a service which is Bosch deployed you can co-locate it on on vm's it doesn't have to be on its own vm but it's a platform service so it lives in Bosch or not an app okay any more questions uh more people are coming in so you're all late