 I'm Ian Houston representing the snakes in a bowpack talk This talk is going to be about how we added Condo support for the Python bill pack and it's gonna be focusing on the storyline and the process behind how we solve the concrete problem By adding a new feature to CF Cool. Yeah, so I have a problem. I'm a data scientist at pivotal lab So I'd you know work with our clients to build applications that use machine learning and predictive analytics tools and I use a lot of Python so I was using a lot of the rich ecosystem of Python packages and Including, you know, open CV so I could learn TensorFlow this kind of thing and I wanted to now deploy my Application and I want to use cloud foundry One problem is that a lot of these packages have significant dependencies see and for trying dependencies so building them from source and which is what the Official build pack was doing can take hours It's you know the sort of thing that my MacBook takes a whole day to do So it's hard to do that in the time that you get staging an app and So You know, I need a solution for this problem. I need to be able to deploy these these apps So, why would you want to do this? Well, you know, one of the really important things for me as a data scientist is that you know My models my my predictive analytics doesn't just end up in a PowerPoint somewhere It doesn't just end up, you know as a report and it's actually part of a product So we need to deploy these models in order for them to be able to generate business value I think that's the most important thing So were there like no solutions at all available or what? Yeah, like I I built a little solution And but it was it, you know, I made a community build pack I learned what a build pack is and you have these three script files that you have to do and So I made one and it uses conda, which is this Binary package manager for Python so that gets around this problem of having to build all these things from source and using You know complicated C and Fortran Dependencies and having to use you know have that tool chain available And so using conda, which was made by a company called continuum I was able to build this build pack and it worked like it worked for my few use cases. It was great Unfortunately, it wasn't very, you know, well built. I wrote it for one thing And it didn't have many tests didn't really have a pipeline to build it And I basically didn't have enough time to maintain it like I have a full-time job So this was not, you know, something I could do as part of that You didn't want to like spend extra time on writing free software for everybody or Like I did and I released it to the community, but now I didn't have the time And but there were a few other things that we could have used You know, there's a Heroku build pack One by Kenneth Rietz, you know sort of famous Python contributor And that hasn't been maintained in a while and it didn't work first time around with slide founder It was very like Heroku specific stuff in there And so we just couldn't use that continuum released their own one a few years ago But again, it hasn't had many updates The Python community has actually moved on a little bit And you know from the conda came out of a time in the Python community when Like Guido van Rossum is the sort of benevolent dictator for life for Python told the Python scientific community that they couldn't solve the problems They are having with all these dependencies for try and see In pip and pi pi, which is the standard way to install packages And he said go off and build your own thing Now pip has caught up a little bit with some binary package management But it doesn't deal with anything that's not Python So you can't install something else like open cv And and add the python bindings. So there's a good talk by jake van der plas that goes through all of that So, you know the other thing that you know people talked about was docker Oh docker. That's the hot new tech right now, right? So why don't we just push up a docker container? Yeah, and and we could have done that and and you know That gives you control, but it also gives you a lot more responsibility So I think you know the way I kind of think about it is I'm a data scientist I don't want to have to worry about operating system layer stuff I don't want to have to worry about heart bleed and open ssh library versions And so, you know build pack led to me focus on the bits. I want to do Yeah, cool. So yeah, a lot of people have said that it's very easy to bring in your own dependencies for a docker container to get like a baseline app Working, but it's a lot more difficult to secure and keep updating those dependencies And if you're going to you know push a data science app or running experiment In a production manner, you are going to want that security and those updates So that's where you might want to use the build pack I mean if you're more interested in like the build pack first container kind of Differences for staging and pushing James Meyers and Jenspenny from hb are giving a very in-depth talk on this tomorrow. So check that out Cool. So James you work on the build packs team, right? And you were uh, the build packs team was kind enough to To sort of help me out with this And what did you have to do or what did you sort of start working on? Yeah, sure. So uh at pivotal we when we deal with Kind of an unknown feature or just unknowns in general We'll have these things called investigations or charters for kind of exploring This new thing and gaining knowledge on it. So we did a few experiments The first experiment was just to use miniconda in the cf python build pack and that was basically just In the compile script running the install for miniconda and Basically seeing how that worked like what changes were needed and what this drove out in terms of knowledge for us What's the differences between miniconda conda and anaconda? Which is actually very confusing. It all seems the same. It's kind of badly named. Yeah, exactly But they're actually very different conda is the actual package manager So I guess the equivalent in ruby would be like bundler or ruby gems Specifically java, maybe maven miniconda is the package manager itself and all the things you need to actually run it So for the bundler example, it would be ruby And whatever gems that bundler depends on All packaged together and then anaconda is Miniconda, so the package manager everything you need to install it and this whole large slew of data science packages that comes with it So that's what we gained from that experiment and then the next experiment we tried was to actually add all the low-level dependencies like fortran Various c libraries like c bindings to the root of s like the root file system that all containers run on and this turned out to be a bad idea because What we found out was that these are very large dependencies And it was a bad idea because if we added the root of s we'd be adding it to everybody's apps So even if you don't want the even if you're not using the python build back You'd have these on your root file system for every container And then the last thing we tried was to vendor anaconda itself into like the entire distribution into The python build pack like the cache build pack version And this didn't work out because anaconda is very very large like the sheer amount of data science packages that it comes with is very very large and This would call if we pursued this solution anyone using the python build pack Even if you're no interested in doing any kind of data science Would have these dependencies built in so What do we do for our solution? We decided to just port over ian's very well written actually condo build pack into the python build pack as a kind of separate code path. So this way Users who are interested in getting their data science packages Will have those packages and their dependencies You know in their app containers and those who are not interested And having those dependencies are not going to have that extra file size. So this is extra overhead to deal with That's great. So I know like, you know, you use the code that I've been Using before but there was a few things you had to do to it like a few sort of fixes You have to make like what what did you have to do to make it a bit more Stable a bit more maintainable. Yeah, so once we got the basic actually basic condo support into the build pack We kind of released it and we used it and we had some Users use it and we got some valuable feedback and implemented these fixes So for example, mini condo was always installing python 3 Even if you tried to specify python 2 as the version you wanted we fixed that The mini condo has a progress bar which actually interacts with cf staging logs in an interesting way and that If your progress bar like continually ticks and updates Like it's the terminal it'll result in like massive massive logs. Like what you see there is just 1% of like the progress that much output was actually overloading doppler and we were like losing messages So we decided to just suppress that output For staging, I mean you really just care that like the end product is that you know, all your dependencies get downloaded The other thing we fixed was that the mini condo If you push your app installed all these dependencies and then you push your app again It would re-download those dependencies because it wasn't using the app cache So we got it to actually, you know, put your packages into the app cache so that on repush You're not re-downloading them And then the last thing was that mini condo was actually breaking our python bill pack builds because Ian's bill pack was pulling the latest version of condo, you know, which for like a quick solution is You know, maybe what you want, but for reproducibility We wanted to actually lock it to a version because every time a new version of condo was coming out Or mini condo it was breaking our bills because they would probably slip in some tiny little feature that would break our sample apps and then Build that was green the day before is now red. So we locked it to the latest version at the time and yeah So, you know, we can actually do quite a lot of this is great that your team was able to help out we can We've started using this with with clients in projects and we've started using it in a variety of different scenarios Using machine learning and predictive analytics a few of these are, you know, production scenarios with our clients You know one for a delivery company that's predicting the time to delivery And so that someone can get updated as their package is arriving And working with a car maker in europe to improve their supply chains They're able to predict what parts they need to order from their Suppliers in order to reduce the the cost and the time needed For those supplies to reach them And then we're actually working with another car maker to deliver warnings to drivers when they they have You know bad weather on the roads and my colleague dat is going to talk about that in the next session about how you know this internet of things that is Enabling people to to save lives effectively on the roads using the the new python build pack So I think you had a scenario as well where you managed to get some usefulness out of this So I actually got to use it in my personal life as kind of from the hobbyist perspective So right now i'm taking this i'm doing a master's at droja tech Using the omscs program and i'm taking a machine learning class and For one of our homework assignments. We had to I think it was apply five supervised learning models to two very two data sets And so that's a total of 10 experiments and then for those experiments Being a millennial I wanted to watch netflix while doing those experiments And doing those experiments on my macbook pro which is like four years old Kind of dinky now I basically while running the experiments couldn't do anything else on my laptop couldn't use chrome couldn't couldn't do anything Couldn't pull the video players so I thought how do I multitask and watch netflix while doing my homework? And the solution was pdubs pivotal web services, which is uh, pivotal's Hosted instance of cloud foundry. Uh, so with pdubs I could just wrap my experiments in like a pretty minimalistic flask app Push it up to pdubs. Uh, let pdubs do all the work of like downloading the dependencies and actually Running the experiment and on my own computer I you know had the appropriate memory to watch the entire first season of changer things which I highly recommend is very good Uh, and then Yeah, so I did my homework and multitask watch netflix and I was happy all because of the kanda in the phytonville pack So that sounds very useful Um, we're gonna have a quick look at what it looks like Yeah, let's do a demo Do a demo. We're just going to walk you through what the uh, the output is so Didn't want to chance the conference wi-fi And do you want to talk us through this a little bit? So we start off, you know, we have a little app and I can actually go and have a look at the Uh, this is the environment.yaml file that you give to your app and it basically is the list of python dependencies So this is what kanda uses to understand what it has to Install yeah, so uh, if we take a look at this What are the a few of the packages that you would say Before the con uh con and the python bill pack weren't really wouldn't really be possible Yeah, so I think the main ones here are pandas, which is like a sort of data frame analysis tool and psychic learn Which is kind of the standard Uh python machine learning library and so those are kind of like big problems These are kind of from the pi data space like that and so yeah, so if you If you have this environment.yaml file the bill pack automatically detects it's that right? Yeah, exactly Yeah, so the regularly the python bill pack would use like a requirements.txt to Indicate that you want a python app. It would go down like the pip installation Code path or route, but the environment.yaml is a completely separate one Yeah, and you can kind of see here like we just push this really simple app It's just a flask app and you can see I think it's here. It suddenly uses the uses the conda bill pack route uh code path in the bill pack So it installs the python environment, installs mini conda, you know installs a few different things Yeah, so a few of the interesting things you'll see here that uh, you know after we've installed these packages It'll clean up the the original tar balls and if uh one of the Dependencies is where is it? Yeah, so this one I think is like 120 megabytes Yeah, I would go a little further down. I think there's a fortran one right like live Libg of fortran is there So that's something that you wouldn't want, you know fortran on everybody's route file system. That's Not really necessary Unless you're a dead scientist, obviously, yeah Okay, so, you know, this works in the usual way It just looks uh like part of your normal cf system And we can go have a look at What the app does so this is app that i'm pushing is just a really simple sentiment analysis app It's a model that's pre-trained somewhere else in a large corpus of text And and you're trying to determine whether something is a positive Sort of sentiment a negative sentiment or maybe somewhere in the middle some neutral sentiment And now we've got it running it's up in pdubs It's running it'll take you send it some sent some words and it'll tell you the sentiment of those words and we've gone from having a You know a machine learning system that was developed and created on someone's laptop Maybe it was trained somewhere else on a sort of a big data system And we're now deploying it in exactly the same environment And on cloud foundry. So we're able to reproduce that environment using that environment yaml file and deploy it really simply So, you know, if you can't see the sentences the first one is this app is awesome and in the cloud Hopefully that's a good sentiment Uh, so the middle one is today is Tuesday and the last one is i'm so mad and angry and if we send this data to the server What we get back is the sentiment so Uh higher numbers or higher sentiment better in sort of uh more friendly sentiment And you know the top one gets like 82 percent Sort of high sentiment as expected the middle one is kind of neutral about 40.46 and the bottom one is yeah pretty negative 0.14 I'm gonna see like, you know, this is a really simple service, but we can Uh, you know change the data we send in here Yeah, we're saying this app is awful when it's in the cloud and the sentiment goes down to like 0.18 So, you know, this is a live service and you could just start sending it data and it's responding And we've been able to do that really simply without having to worry about all the other underlying structures Okay, so I need to Need to get back to chrome somehow So what we're going to talk about next is the kind of what's going uh What's in the future for kanda and the python build pack because you know the feature is definitely not complete Or like feature can feel it can features really be complete Yeah, they they can I guess I don't know why I said that um, okay So the first thing is to run this functionality in air gap environment. So a lot of uh corporations You know run their cf installations in completely airtight environments. No Calling to the outside network So right now the kanda can't do that and or kanda and the python build pack can't do it Either because kanda doesn't support rendering of packages in the same way that say bundler for ruby can just run bundle package and this is it's a hard problem for them to solve because these really low level big dependencies like fortran C library c bindings is very difficult to both package that in with the python packages on top And install them in a way that like, you know, all the file paths are correct All the bindings are correct and all of that stuff. So I think they are working on it But it is definitely a tough on the solve So I'm going to be able to support it in the build pack until you know, they support that I think there is a way of basically running your own local repo So if you run a repo on a different server, you could actually get the packages from there But yeah, that's sort of not exactly what you need out of the scope of just the build pack itself. So uh, another thing that is related is vendering many kanda So that for the cash the cash build pack You wouldn't have to download many kanda during staging. You know, it really has many kanda inside So we have worked for that in our backlog But it's not super high priority because one of the biggest reasons that people use the cash build pack is for the firewall air gap environment since we can't support that yet, you know, it's not super high priority Something that is a little higher priority is Reducing the size of the end droplet What we were seeing with our sample apps that use many kanda in the python build pack Were that we were getting some ridiculous Disk sizes for the end droplet. So we would be you know, pushing the sample app It would fail for some reason and we look at the staging logs. It would say Uh, I forgot what the cf exact error message is but it's basically like, you know, you run out of disk space for your droplet So we realize that this is because you have all these large dependencies now on your droplet and Maybe maybe there's a way to you know, not have all the dependencies on the end droplet like only the ones you need I I we don't know enough about many kanda to Be able to say that with confidence. So, you know, we were doing an investigation to find out if we could reduce the size of the end droplet And the last thing is just your feedback. Um for that fixes page most of the Uh fixes were implemented as responses to customers saying or users saying, uh, you know I don't think this is right or like this probably needs a fix or this is a better user experience Uh, like just for example when we were pushing the sample out, we noticed that uh, it doesn't print out a clean list of all the Uh kanda packages that you just installed. So, you know, that's something that we're gonna want to fix So it's you y'all using the Functionality and giving us feedback that'll drive out more stuff. Where can people get it? Like, where is it? Oh, well, it's already available. Uh, it's in the official cf bill pack as of bill pack 156 So, you know, since all of you keep your cloud foundry installations super up to date and all your bill packs of the date You should already have this functionality Great, so and you talked a little bit about there about how people can uh, submit bug fixes But what if you want to submit something else like, you know, I was a pivotal employer. I'm a pivotal employee So I was able to just go to the build back steam and say can you help me with this? How does someone who's not a pivotal get something like this uh get involved? Yeah, so the answer is uh joined pivotal as an employee and talk to the right people Oh, okay. Is there another way of doing oh, no, so yeah, there's an open source way to do it So for the cloud foundry ecosystem, we have this cf dev mailing list You know, it's very active a lot of discussion on it a lot of features come out from someone saying Hey, I have this, you know use case or like I have this problem and How do I fix it and maybe Uh, the functionality is not there yet. So then the discussion opens up on what that functionality should be, you know details of it In a more real-time communication Uh format we have the open source cloud foundry slack Which has channels for pretty much every cloud foundry team and various topics like bill packs is on there is like hashtag bill packs So you can definitely open up a discussion on there as well and get more real-time feedback and a lot more people jumping in You can also do the cf dojo program, which is a six week program where you Basically gain the skills to contribute to cloud foundry full-time Um You I think you would still need to be at a company that really Puts full-time developers on cloud foundries. So for example, just because you Have gone through a dojo program. Um, you can't just expect to submit prs off the bat with like no no Proceeding discussion and then expect those to get merged in Like you'll notice that open prs Opening a pr isn't on here because if you just open a pr for this massive new feature That has not been discussed at all yet Um odds are that implementation is not going to be accepted. Um, it is a good way to like open up discussion But uh, we would rather have you maybe open up discussion first instead of spending all this time Implementing your pr and then being sad that it doesn't get merged in exactly as it's um, cool, and There's also the feature narrative. So this is Kind of a more formal way to get your feature into cf or like the the end Process part where you are writing about your Very concrete use cases and opening up the discussion on How to maybe fix these Problems like the actual what the feature would look like how it would get implemented So now you might start talking about you know changes to the cc api changes to Uh the bill packs and etc But the most important thing here is concrete use cases. So if ian had just came to us and was like Yeah, I really want to use data science in the python bill pack We wouldn't have been able to really go Anywhere with that without like driving out more specific Concrete use cases like the ones he was talking about with the uh, like the transportation companies so Concrete use cases are ones that people can empathize with and that Other developers other companies look at and be like, oh, hey, we have something kind of similar to that And then they're going to jump in the discussion and talk about their use cases and it's going to be a lot more productive great Well, thanks james and your team for for helping me out with that And uh, thanks everyone for listening to our talk. You can find the code for the build pack up on uh, the type foundry github Um and come ask us questions now We'll take questions here, but also on twitter. I think as well. So yeah, that's just like github and twitter handle Cool. So thanks everyone Any questions about like python bill pack Or contributing cloud foundry in general Sure, so when you say external dependencies are you talking about like the lower level bindings or like the packages or sure, so The way that we do a lot of our testing for the bill packs is through like high level integration tests and sample apps So for an example for the condo bill pack. We had three or four very explicit mini condo like python apps that use mini condo And so we test we the pushing of the those apps and asserting certain behavior on it We have like a testing framework called machete that just wraps around stuff like the logs Hating requests and etc In terms of like the actual dependencies we're pulling in like say Uh, like lib g4 trend We have to we don't test that because that's outside of our scope We trust that the lib 4 trend maintainers are testing their code and furthermore, it's not like These things are being pulled in for every cf user like if a cf user is saying I want this package That uses lib 4 trend. They have to kind of understand that maintenance aspect as well It's fair to say as well that the condo package manager has a whole suite of tests as well So kind of we're trusting that level and then yeah, you can't because it's The user can specify any package to bring in there's no way we can check all of those things. So yeah, interesting stuff interesting tidbit about that actually it's like, uh, even though Continuum is kind of the company that backs um mini condo and condo in a way condo itself the package manager is fully open source and And the condo is almost like fully open source like most of the packages are open source Continuum just really maintains the anaconda distribution So they're the ones who ensure that when you're installing anaconda You're getting this suite of specific data packages and like the latest updates and all those other things But you can check out the condo package manager code yourself. So Any other questions? Nope. Okay. Thanks for your time. Thanks everyone