 So hello and welcome to this first session in the club foundry club foundry days My name is Jan. I'm working for for SAP in the Bosch open-sec CPI team Although we're going to work more on Bosch as well And we do deploy club foundry on open-stack At least according to to the survey. We're not the only ones. So it's a prominent workload Like it was the second most behind Kubernetes like today morning I saw that there might be a newer survey where it is like second place together with open shift But it's a prominent workload And so interoperability is is important for for all those for all those users And what I mean when I say into operability is like I have a certain workload And I want to run them on different open stacks be it different versions different distributions or Even just different configuration So what is club foundry like from the end user point of view? It's a rather simple Cloud application platform. You can just push your code and it's going to be to be running from the other perspective It's a rather complex distributed system with heterogeneous components That interact with each other So This is all managed by Bosch in the case of club foundry at least in most deployments And Bosch is a tool to manage distributed systems in general like creating the VMs Deploying the software that should be running there Creating disks attaching them to the VMs Monitoring VMs and resurrecting them if they are failing That's all done by Bosch and Bosch is Multi-cloud by default. So it has a cloud provider interface, which there are many implementations off one of them is the Bosch Open SAC CPI And that's exactly what I'm working on with my team So as a short wrap up of what I'm going to talk about Interoperability is hard. It's tough not only providing it, but also verifying that things are as you expect them and What we see from the open stack community like the def core recommendations Ref stack is just not enough like I mean open stack powered platform is far from being you can run cloud foundry there Really really far and that's why we came up with the cloud foundry open stack validator Which is a CF specific interoperability test suite That tries to use the open stack in question just as Bosch and the CPI and cloud foundry would So we create VMs with the CPI do all that life-cycle stuff attaching disks detaching them taking snapshots all these operations In addition, we that there are some more requirements like The interconnectivity between VMs and it's easy to extend so if you have An and cloud foundry product that is based on open source cloud foundry. You can you can write extensions for the validator So you might ask what's the problem right into operability isn't that built into open stack? There's def core recommendations ref stag a test suite that actually that actually tests these recommendations But turns out that at least until recently I think there was neither neutron or syndrome or glance were were actually a recommendation that you needed to get the brand I'm an open-seq powered platform. I think that changed recently So there is a new JSON file in the def core repository that actually includes some neutron stuff and has some glance and sender stuff But yeah, that was not the case until until recently The second approach to interoperability that we've seen the first time on the on the Barcelona Summit was the interop challenge Where like 16 people were on stage deploying the same workload to 16 different open stacks The problem there was it was a rather small web application So nothing compared to a complex distributed system like cloud foundry with special requirements in this era area and It uses a library a python library called shade that actually hides all those into All those problems all this incompatibilities and interop problems by having a cloud config that Tells the library where it has to apply the the compatibility switches So it's going to work for exactly 16 open stacks And that's a big problem. I mean you can use that For public clouds, maybe because then there will be a provider of that, but as soon as you install open-stack on premise Who's going to manage these these configuration files? so In short 16 people doing the same workload on 16 open stacks is a really good good thing So don't get me wrong there this step forward for the open stack community But it does by no means imply that I can run my workload on my open stack so basically if you either Deploy cloud foundry on open stack or provide an open stack that should run Cloud Foundry as a workload That's the question that you want to answer to write to answer. Does it run on this open stack? The sad reality from from our point of view like as the Bosch open-stack CPI team We were approached like here's a new open stack Can we run it there and from our point of view? The sad reality is there is no interop. There is different versions different distributions even different configurations can can make the things break So for us it meant really for each new open stack installation Testing that from the ground up which we did by actually running our continuous integration pipeline Against that so it meant Using Terraform to prepare the project with network and keys and stuff like that then installing Bosch Then installing concourse, which is our CI system that is deployed with Bosch And then setting our pipelines and having them run and most often we had them fail So it's a long list of manual steps Each of them can fail for a barrier for for various reasons of all kinds of reasons And it's often hard to to see what is the actual problem and what is a possible fix for that problem and Even if we were done with that turns out there's this little document on On club foundry.org that tells you how it's not meant to be readable on the slide It's just to show that I think it's 18 pages long what you have to do to test if If Cloud Foundry runs So there's things like VMs have to be able to talk to each other There is like Cloud Foundry needs larger disks than you might have in a CI environment Things like that are in that document And it's even worse Because that document actually states well, I can tell you that it's not going to work But if you've really completed all these steps, yeah, it might work It might not You don't really really know know that so Who have you successfully installed Cloud Foundry on OpenStack? How long did it take? Huh, okay. Yeah, I know why You would say it's because you're not using Bosch anyway What were the arrows looking like were they helpful? So that's our experience as well. It's like it's a long process all of these steps will fail because they can fail For different reasons you'll have to really dig deep and then start over again when you found the problem and solved it You'll just start over again So from our perspective, we've seen that ref stack and interrupt challenge are just not enough. They are good projects But they are no guarantee that Cloud Foundry will run I don't know. Maybe they know OpenStack really well or they chose the right OpenStack installation So yeah, I don't say that this can't happen, but like it's not our experience So either you have a brilliant OpenStack team a brilliant Cloud Foundry team Whatever maybe I should state that we are hiring and anyway So our experience is like ref stack and interrupt is a good thing But it doesn't work for us to guarantee that Cloud Foundry is going to work So we came up with this as an alternative like a small command line application there's some preparation work you have to do for the for the project for the OpenStack so you need to have a project there a network a Reserved floating IP some some stuff, but that's documented and there is a Rather small configuration file with about 10 mandatory values Half of them is like OpenStack credentials Network IPs things like that. It's not a lot and then you can just run it and See if everything is fine This is like the principle that we apply is what I said already in the beginning. We really use the CPI To to interact with OpenStack do the life cycle that that Bosch would normally do and In addition on on VMs that we that we fire up We do some more sophisticated tests like can they actually reach each other network-wise and things like that is Like the API right rate limit is one of the checkpoints in the in the document in the documentation how you check If the OpenStack is ready to run We've some pre checks Like that work on the API level as well like other security groups set and things like that Testing network connectivity of VMs like can they reach the internet all all kinds of things But now coming to the more maybe more interesting point. I mean seeing that green is really really cool We didn't have that often in the first run So most of the time we had problems. So the question is if it goes wrong, what went wrong? How can you fix it? And so I brought some some example outputs In case you had problems So one of the things we need to check is Bosch needs to access VMs via SSH That's for talking to there for for providing Bosch SSH for you so you can SSH into the machines And then we have actually to to checks like a small pre-check just checking the security group So for the easy failure that you just don't have a security group allowing that traffic But we are then SSHing into a VM And actually trying to reach another VM on on port 22 Doing that with net cut and in case of failure just reporting the that output This is a bit a bit long I actually filed a bug on our project for that because SSH unfortunately logs its warning on error on standard error So we print it here We should actually get rid of that because we know what's what it's going to print But it's the net cut sad is says it can't connect to that VM. It's the basic thing that you that you see there Foundry VMs have to be able to reach The internet like build packs need to download stuff from somewhere So we actually do a stage Check so to say like first we try and NS look up if DNS actually works and that's the failure I brought with me So we try to reach DNS with the configured service and if that fails we provide you with an with an error There's another unfortunate bug here. So that's a little bit fake because it's not fixed yet. I guess Unfortunately, NS look up logs its error to standard out. So we don't print it So I I actually added the last line that says what would NS look up has this output Here we have more Harder even harder one. That's like You can believe that that distributed systems need to agree upon time So this is something that can go wrong after you successfully deployed Cloud Foundry because after some time The system goes out of sync. So that's something we have seen Because we need to use internal NTP service Because we can't reach out on 53 and so that was one of the problems that we've actually seen seen life And this is really a hard one if that goes wrong Figuring out what went wrong a couple of weeks ago. So to say is a is a really tough one So you already see a theme. I think there's lots of network network tests I really hope to bring with me the MTU test But that's just made it up to the backlog, but it's not implemented yet So I guess lots of you have seen MTU problems We're going to write something that is actually trying to to send packages over the network and figure out what's what the MTU is and Figure out if it's according to the configured one. I mean recommendation for Cloud Foundry is 1500 You could change that in in the in the validator YAML if you know what you're doing Because you have to configure lots of Cloud Foundry components You have to actually change open stack to give you that if you use GRE tunnels or something like that You have to increase the packet site on the outside of the tunnel to actually have 1500 on the VM True either one is good But yeah, I think I would recommend to actually change the open stack to say like you never know If there's a component in there where you missed the configuration or that is not configurable Or whatever, so I would actually give the the the VMs 1500, but you can change that And it's not yet implemented. That's like on top of our backlog as one of the next tests that we introduce here But it's not not there That's another thing that's from the from these 18 pages of documentation Cloud Foundry needs large disks One of the typical scenarios where you get an error there is if you try to deploy Cloud Foundry on dev stack because there you have To change the configuration to really get larger disks. So that's why we we placed actually that hint there I think we had two or three bugs Actually for for that that was yeah And then there are other things like the cloud controller needs a blob store You might use Swift for that and there's a whole lot of things that you can do wrong there You need to configure an x-account meter temp URL key if you don't do that or use a different one You'll get a 401 like you see here another reason for failure can be that the Swift proxy service actually not configured to serve the URL That's something else that you can you could do wrong if you use that as As the blob store Now our promise is like if the CF open stack validator passes You can expect that cloud foundry can be deployed on that open stack If that's not the case we I would consider that the bug in the validator It should actually show you that and if it doesn't pass It should give you like output that is like actionable feedback You should then know what you should change in your open stack or what you should do differently To make it pass If that's not the case open an issue on our project for better error messages or whatever or things That you that you found or if you found if you find an open-sex system where the validator doesn't pass But you can deploy it's cloud foundry successfully. That's just the same. That's all issues on our side Now most of the vendors don't really have just open-source cloud foundry, but have products based on open-source cloud foundry So we made the validator Extensible so you can rather easily plug in extensions to test things that are not necessary for open-source cloud foundry But maybe for your product That could be different things the The Swift Blob Store thing that you've seen is actually an extension because you've the choice of S3 or Swift So it's an optionable component an optional component other things are weird checking for Certain flavors to exist and be configured in a certain way like our product expects it like we use them in in our Manifest in our bush manifests, so they should be present We expect certain quotas to be in place and we have an extension a configurable extension for that So you actually give it a file a separate file like this is the flavors I expect and this is the course that should be in that flavor Or even meter data like if you need hardware random randomness You can specify that there's actually a flavor that should be configured to hardware randomness in place and We have another extension that checks accessibility of external components So we have our own enterprise github that we need to reach and that has to be like it's not in the public So it has to be there has to be some network configuration. So we have an extension for that What we are planning to do is to cope with non-functional requirements like Performance like we are going to check or I mean this only plan We're not fixed yet. How we are going to do that But checking things like disk IO that you get and that it's according to what you would expect for running your cloud Foundry or What else is a good a good example? There is there's security recommendations that we're going to place in there like if you're using Swift as a block store You should probably have a user that is only allowed to use Swift and is not allowed to fire up VMs or shut them down So we'll probably do that as extensions so that you can act It's a recommendation if you want to do it differently than just don't turn on the on the extension And you can actually write your own extensions. We provide an API for you like that you can easily Call the the open stack API is via fog open stack. That's the Ruby library that we use or You could use the CPI to do things and we've resource tracking implemented So every resource you can add resources to the to that tracker So after the test run everything is going to be cleaned up like VMs that are left in place because I don't know the test Broke before you shut them down We are actually going to cope with that So there are APIs that you can you can use to easily create Extensions so that you get an answer to that question like is your cloud Foundry based product running on your customers open stack So coming closely to a conclusion We we've seen that raft stack as the least common denominator is not enough for a guaranteeing That Cloud Foundry runs Interop challenge in its current state isn't either And I've shown you a tool though that we actually use to get an answer to the two questions that are That are there like will club foundry run on my open stack Will my club foundry based product run On my open stack Now is that how interrupt or interop checks should be? Definitely not. I mean the way we are doing is like try things out Catch any errors and try to figure out what went wrong and provide usable Usable feedback so that's definitely not the end of the story I would rather spend work on Bosch or on Cloud Foundry other topics instead of writing this Disvalidator that was out of necessity and there's a there's an interesting project in the open stack community That's called oak tree which is based on the shade library that I've talked about like the library That's used for the interop challenges and their goal is to provide a gRPC endpoint and In the end something that that actually gets the required capabilities from an open stack So you should you could run it just against any open stack that provides its capabilities and So writing a validator life. We like we've done would be just checking does it support creating large discs? check and all the other things like That would not require actual testing because they would cope like like the shade library now for 16 open stacks Would actually cope with all these interoperability problems So that's a really interesting project that we are going to follow Closely to see what how that how that goes because because of the gRPC approach We could actually use that like generate a Ruby client that we can use instead of folk open stack or Changing fog open stack to actually use that Whatever whatever would work there So that's it from my side. Thank you very much for listening and now I've still some time left for listening to questions that you might have If you don't get to ask your question, there is my name my Twitter handle my github Handle and there's the two projects that we actually made maintain the validator and the open stack CPI So again, thank you And if you have questions with two microphones in the room so that the recording also has the questions I'm a I'm a living club foundry myself. I started learning club foundry Little time ago. I Find it very interesting But there is one question I have is that I Understand that your validator has open stack specific commands, right? Yeah, okay. I was wondering. I mean I find this in this program interesting and I wonder if it could be Infrastructure Agnostic like saying we don't care where we want to deploy club foundry We want you just want to test that we have the proper infrastructure Just an idea Yeah, I mean we have tests that actually use the IS APIs So that would have to be like moved out of the core test suite maybe but we use the same The same interface to the CPI that Bosch uses so we do shell out to the CPI like Bosch does So it should be as easy as Bosch can be used against different infrastructures It's just we have not yet seen the use case for doing that on AWS So from our perspective AWS is rather stable and if club foundry works there It's continued to it's going to continue to work there, but it could be changed to actually allow different CPIs. Yes As I said, there are some API calls and there's an API that gives you access to open stack So things would have to change there And you would probably have to provide the APIs for that specific IS that you're going to target any specific IS you had in mind To should be possible Yeah, I mean I'm actually interested in that myself because I've got got a site project To create a Kubernetes CPI and that would be really interesting to actually check because that's a Similar situation. So you actually might deploy Kubernetes to somewhere and might have to check if it's configured correctly So, yes We have VM wire integrated open stack separate instance and we have PCS PCF separate And we are planning to have both Together, so what what would be your advice like, you know, can we integrate that or can we have both a separate instance and applications talking We are ours is a large distributed applications So I would like to know like we are in a planning stage. Yeah, I'm not sure I I got the question correctly we have VM wire integrated open stack and PCF both are running separately, but the applications are talking, okay integrated So we are thinking about moving PCF into VM wire integrated open stack to run on run over VM wire integrated open stack So is it advisable to do that? Have you seen any Limitations like that. I haven't used I haven't used that. We are mostly running Suzer cloud So we have a dedicated open stack that is not running on the ember But no, so I don't really have an advice. Hmm It should be possible, right? If it's if it's an open-sec integrated the ember Yeah, sure if you if you're actually talking open-sec API So if you have a VM bird that is like playing open stack and you could just run it the validator and see if it's green If not, you hopefully get Good arrows that help you figuring out. What's the problem if not as I said open it open an issue because that's That would be interesting for us as well like things that go wrong and How to figure out what exactly went wrong sure You were very welcome You first you should reach to one of the microphones Probably So that this gets recorded Sorry The validator as a diagnostics tool and new working environments that are exhibiting problems to try and Discover what the underlying issue might be? No, we actually run that Regularly in our open-sec environments like we have a CI pipeline in all kinds of open-sec environments They are all running. I mean they have a dedicated project and they are they're actually not spinning up so many of the M's that you that should shouldn't shouldn't be a problem. Okay, so it should be so it should be safe to run that in a Production environment, you mean? Yeah, yeah, we run that on on an open-stack where customer things are running on That's cloud foundry support native shift API for blab store Do we have to have any S3 API installed on top of shift as far as I remember? Cloud Foundry does support native Swift. Yeah, it does Because it's it's Bosch that doesn't so you can have an S3 compatible Swift as a Bosch blop store there You don't have another chance, but cloud foundry does support Swift native Actually, we are installing pvotel's cloud foundry on top of open-stack and they are saying that they don't support Direct Swift API and they say they don't yeah the cloud controller has actually Yeah, it has an implementation for a Swift blop store that is using fog api's so Because I don't know what's yeah, the problem here is S3 API is a third-party component and it's not well integrated to the open-stack foundation and There are some support issues regarding installing S3 API on top of Swift So is there any plan or roadmap that you guys are going to directly support Swift API? I guess you have to ask that Question to pivotal because as far as I know it's possible to use Swift without a compatibility layer I mean for Actually, sorry for interruption actually for writing that that test We checked out the cloud controller code to see what they are actually doing with Swift and they were using it natively using fog api's They don't modify anything and they're directly using the boss CPI I'm not sure, but I'm just asking I mean As I said if that's about the Bosch blop store then that might be right because Bosch doesn't support Swift natively But from what I've seen in code The cloud controller should really support that. Yeah, it could be boss Yeah Yeah, sure, that's like as far as I know we actually use I don't know either S3 directly or an S3 compatible Swift for Bart But we do use Swift as a blop store for our Clyde foundry All right. Thank you. If there are no further questions, then thank you all again for attending