 Hello, everybody. Thanks for coming. So today, Rich and I will be talking about some of the data-related aspects of bringing applications onto Cloud Foundry. And I would like to start by just introducing myself, and then Rich will introduce himself. So my name is Roman. I work at Pivotal as Director of Open Source Strategy. But today's talk has nothing to do with the vendor relationship. So today, we're basically talking to you as members of this organization called ODPI, which I will explain what it is and what it does. So all that we're going to talk about has to do with the two foundations, ODPI and Cloud Foundry Foundation, sort of figuring out some interesting aspects. It doesn't really sort of relate to any of the vendor promises or any of the sort of product-related concerns. So with that, I will let Rich introduce himself. Yeah, hi. My name is Rich Pellevin. And I'm co-founder and CTO of startup called Reactor 8, and our focus is on configuration and automation. I've been working in this area first on the network side for over 20 years. So for 10 years on the network provisioning side, then for the last 10 years, been working on virtualization and more on the server app side. Also, what informs the work I'm doing is I did a PhD in the area of artificial intelligence planning, which has the scope of trying to figure out how to generate plans or workflows given desired state, seeing in the recent years that that's kind of the mode of configuration management describing end state. The AI work was very applicable in the stuff we do. Awesome, thank you. So I guess before I actually try to explain what ODPI is, let me maybe decode one of the sort of buzzwords on the slide title. So when we talk about data-driven applications, what do we really mean? Cloud Foundry is awesome technology for sort of traditional application deployment, but I think the applications themselves are changing. If we used to deal with applications that are just traditional sort of 3-tier applications, you have your database, you have your front end, stuff like that, what we're seeing more and more today is that those applications morph into being fundamentally dependent on the data models that are being a lot of times built either offline or even more so, more and more being built in real time using streaming and big data analytics. And those applications' behavior gets completely sort of conditioned on how well those data models are maintained, what kind of insight can you get from those data models. And it's a very different way of sort of bringing additional insight about your customers or about the sort of industrial internet of things, whatever it is that your application is managing. So there's this additional sort of set of concerns that all of a sudden you as an application developer has to bring to the table, you cannot really abstract that away just the same way that you're abstracting your relational database. So with that, let me try to explain sort of, well, the agenda is we'll talk about a DPI and then data app case study, the prototype that Richard was, and sort of his team was building. And finally, we'll just talk a little bit about the lessons learned and kind of sort of roll the hat for all of us. So with that, let's talk about ODPI. So at a very, very high level, ODPI is basically trying to be to big data and especially sort of data science driven architecture. What sort of cloud foundry foundation is to cloud foundry and sort of this next generation platform as a service type of architecture. So we shared industry efforts under the umbrella of the Linux foundation, the same way that cloud foundry foundation is sort of under the umbrella of Linux foundation that is basically trying to figure out the path of big data technologies towards a more consistent platform for the enterprise. And of course, you know, these days, it means cloud as well as on-prem. What's interesting about us is that we're actually a platform governed in a hybrid way. And what I mean by that is if in cloud foundry's foundation case, both sort of the platform aspects and development of individual projects, let's say Diego or anything like that, belong to the same entity. In the case of ODPI, those two are separate. So in the case of ODPI, the projects that we leverage are all being developed as individual projects under the Apache sort of software foundation management and governance model. And what we're doing at ODPI, we're basically being opinionated in a sort of shared vendor context about how these different projects need to be put together to sort of represent the consistent platform. And what are the standards? What are the use cases that are driving the evolution of that platform? So essentially providing forcing function and a lot of use cases to the individual projects, while at the same time, individual projects are free to go and implement whatever else they want to implement. So think of us in a different way, I guess, as a meta vendor. So you have your Clodare, you have your Hortonworks, now you have ODPI, where a bunch of vendors are coming together and saying like, this big data platform needs to look this way and here's the set of use cases that we will be tackling all together. So we are expanding the platform to include the usual sort of buzzwords of who's who of Apache projects. Today, with the release one, we are just sort of standardizing on Hadoop and Ambari, but again, this is all the projects that we're definitely looking at. And finally, not to take too much time explaining what ODPI is, it's a very sort of well represented organization in terms of the membership. So we nearly doubled the membership since 2015. And I'm really happy to report that within a year, actually less than a year of formal existence of the organization, we already came up with a runtime specification for how the big data platform needs to be, needs to interact with its sort of applications. We also came up with the reference implementation of that sort of reference specification. And today, that is just a very traditional kind of view of how you deliver a big data solution to your customer. So you have your RPM and Debian packages, you have your Ambari or some other orchestration solution to basically roll out a big data management cluster. It's a very, very traditional way. It's there, but it's a baseline. What's interesting to us is how can we tackle, again, at this meta vendor perspective, the sort of connection between the big data aspect and the cloud sort of aspect. The end goal, sort of the ultimate holy grail for us is to be able to power the data-driven applications on Cloud Foundry. But the first step, you know, first steps first, right? You know, the first step that we have to do, we actually have to figure out how can we exist within the same substrate. And that same substrate, of course, means Bosch. So the rest of the presentation will be basically us trying to figure out what's meaningful, what's working, what's not. In terms of just bringing the reference implementation of ODPI into the Cloud Foundry substrate. So I'll hand it over to Rich. Yeah, it's on. Okay, I'm gonna describe a case study and a prototype that we recently did that looks at using Bosch and Big Top to do a sample big data deployment. And we just chose as a sample deployment the Spark cluster, Spark with HTFS and a Zeppelin front end. And we chose just, we started just reference Cloud platform with AWS. So along with trying to figure out how we could use Bosch and see what the gaps are, we also wanted to look at the problem of one click, essentially trying to get towards one click deployment. The ability for, for example, field engineer to go and wants to get a Spark cluster up. And rather than having to bring up, follow Amazon instructions, spin up a director and a number of steps get started, is there any way we could just push a button basically, put in your Amazon credentials and all of a sudden you have a running Spark Zeppelin cluster. This is, ODPI is based on Big Top. So the idea was to understand requirements that we could bring back into ODPI. And as people know here, Bosch provides the abstraction layer. So although we did this initially on AWS, the idea would be to easily support this over to all the different platforms, infrastructure service platforms that Bosch supports. It was also being a prototype. The idea was to leverage off the shelf open source tools and we'll go into that. So as far as leveraging Big Top, Big Top is a Apache project that among other things provides packaging for major big data services. Along with doing packaging, it also provides configuration logic that what's very nice, it's very, very tightly coupled with the packages. We think there's a kind of a big advantage to having configuration logic and packaging very tightly coupled. And we view recent emphasis on mutable infrastructure or containers as kind of recognizing the kind of there's a blurry line between packaging and assets and configuration management. So as far as leveraging Bosch, now my background is in DevOps and automation, but I came very new to Bosch. So part of this analysis was basically saying a newcomer who's familiar with tools in the area, understanding what is the learning curve and what are the challenges or what are the strengths of Bosch. So at the end of some conclusion slides about where I think there's what was easy with Bosch, what I think is difficult, what I think there is opportunities to have focus in Bosch. The thing we liked about Bosch clearly is giving us this infrastructure as a service of abstraction. The robust no deployment, meaning that we don't have to directly interact with the cloud controllers or the hypervisors, the ability that if nodes go down that Bosch will automatically bring it up. The other thing we really like was the Bosch user experience. The purpose here is to focus not on app deploy but on dynamically configuring infrastructure. So we like the way the Bosch CLI function to achieve that objective. Now as far as looking at what the gaps are, the type of topologies that we've been looking at or have been tended to focus on big data topologies and other topologies that really have complex interaction between the nodes or the service demons, where what we see will look like Bosch's strength was dealing with a very specific topology. The case where you have a service that's horizontally scalable, it's a homogenous service and you want to roll that out and what's key there is they build in the canary strategy to make sure that if you make a mistake, you don't roll out the same mistake to 10,000 nodes. So although that makes it very easy if you're using the canary strategy, a lot of the topologies that we looked at have much more complex relationships, complex security configurations, complex high availability, just very basic configuring of HDFS and having the name node and the data node find the name node that didn't lend itself to this. If a name node fails, that's very different from whether if a data node fails. So that's one of the challenges that we had to get around. So similarly, if a node fails in Bosch, it brings that node up, but in the topologies that we're looking at, sometimes not only you have to bring that node up, but there's connected services or connected nodes that have to bring up. Lastly, we had to represent more complex relationship between the configurations. And I see that in Bosch, there's new features that I think are starting to address that. So they call them links. So that's something that's starting in the right direction. And some of the work we've done, I think it really informed how links could evolve and really helped to make it very easy to express the relationship between nodes and configurations. Another key challenge we found, and I'm gonna get back to this, is the lack of support for standard Linux packaging. The technologies that I was very familiar with, things like Puppet or Chef or Ansible, they all worked around standard packaging. And so you could use those off the shelf and deploy your cluster. There's steps that do the installation and then there's steps that do configuration and server start. Now, in looking at the description of Bosch, I understand the kind of high level goal of you wanna make sure the bits that you know get on the nodes. So you wanna make sure there's predictability and lack of uncertainty. This many times, if you look at deploys that go to package managers, it fails because you lose connectivity, et cetera. In the concluding slides we talk about, we think that the goal of getting predictable configuration and packages on the nodes, that clearly is key. But we think the way Bosch solves it is just one of different ways to solve it and we have some suggestions that how Bosch could view that as one implementation of predictability and could extend out and actually treat package management. Okay, so now I'm gonna go into what the prototype architecture and the approach is. So the approach for tackling the orchestration that we need draws from what's very standard practice in service provider provisioning. What's interesting is if you looked at the history of provisioning and configuration management, 20 years ago the service providers, network for service providers were way ahead of the server and application folks. They had to automate things to bring up their five million subscribers for both the core network and DSL and cable, et cetera. So they really developed a very sophisticated approach and to the OSS and how they layered systems. And one of the key takeaways that I think is very critical and still applies in the app space and in the data center space is this whole notion of layering. So if you look at the way service providers look at an OSS, they look at at one level, there's an element layer which is responsible for dealing with the network elements for the network side or dealing with nodes if you look into the service side. And that looks at nodes one by one and we view Bosch as really providing it. It gives you an abstraction, a nice almost programmatic way of dealing with these nodes. But what's very key if you look at service provider networks is they think about not just elements but they think about holistic services. So the analog in the data center world is you have these distributed applications that you want to think of as one unit. You want a provision spark as a black box service. You ideally would like to add capacity or SLA or security or high availability and not have to worry about the low level element layer coordination. So the way they actually design their OSS systems is they have an element layer and then you have a service layer on top of that that views service level abstractions and talks to the element layer to free itself from dealing with individual, the different type of elements you have to deal with. And that I think is extremely applicable in the application world. The other issue we wanted to solve is we wanted an end-to-end orchestrator that not only could, once the Bosch director is up, could provision things, could actually spin up an Amazon network, could actually spin up Bosch, could automatically use Bosch in it to generate a manifest and coordinate with the subnetwork that you provisioned for Amazon. When we looked at the Bosch getting started guide and we looked at it, there was a big manual step for each service provider, how you go through an Amazon console and provision the subnets and make sure those exact same subnet IDs are in your manifest. We think there was potential to do this all in one system. So we introduced an orchestrator that lived above Bosch that provided this capability and used Bosch as an element management system. So if you look at what the architecture looks like, so the first step would be that you have an orchestrator where there's nothing there and aside from Amazon credentials, and it works in two modes. It could either discover subnets that are there or it could, by giving it a VPC, or it could actually provision those subnets. After that, what it does is it provisions, it spins up a client node, provision uses Bosch in it and automatic manifest to spin up Bosch director. In this case for AWS, we would pick the appropriate CPI and we'd get to a state where Bosch director is up and then we use that as effectively our element layer system. I mentioned that we really like the Bosch CLI. So being a prototype, what we did is we did a very simple hack on the Bosch client to have everything go to the Bosch director except for the deploy command. And so it looks exactly like you're using Bosch CLI, but when you do the step to deploy, it talks to the orchestrator. For this project, we used something that my company just open sourced which is a end-to-end orchestrator and a key part of it is having a manifest-like language that is much more application of service-focused than Bosch but it actually fits nicely in that from our manifest, we could generate a Bosch manifest. So the input to the whole system is a description of your application to the apology, how things link together. You kick up a deployment. What the orchestrator does is then tells Bosch to spin up nodes. Now, what we did with Bosch is we just built one package that spins up our agent so then the orchestrator could talk directly to the nodes. Ideally, we wouldn't have to, if Bosch made it easier for us to install packages, ideally we'd do everything through the Bosch director but we found it very time-consuming to build packages. And I'll just go over this very fast but our manifest, it's XML which is not the important point is but it represents both what's on nodes and it breaks things into components. Node is composed of components. So in this case, for example, on a master node, you would have a Spark master and a Zeppelin server. On the worker node, you would have workers and things and it also has, we've been working with it for about four years now, links that link between services to show the dependencies. This lends itself very nicely. We think we could translate this directly to Bosch manifest. Another important aspect though is actually having a workflow. I know in the state-based world they say well either you have a workflow or you have an end state description and there has been definitely 10 trend towards going to state-based systems but if you start to drill into what AI planning did, you could really harmonize the two approaches. So the way we look at it is that if you just give an end state, we'll automatically generate a workflow but many times in this area you want to have a much more sophisticated workflow. For example, when you bring up a Spark cluster, you first want to bring up HDFS, you want to do smoke tests to make sure that it's up, then you want to bring up Spark, et cetera. So you have arbitrary different type of workflows you want and said there's ways that you could really unify the workflow as being viewed as the plan that's achieving the end state and you get harmony between the two. Now because we had to reach out to outside package manager just some lower level details is we just had to spin up a NAD instance and the system that we, the DDK system dealt with all the service discovery slash IP address management. So we opened up a NAD instance that allowed us to connect outside to a package manager and then for the only external service we had we used it to port map to Zeppelin and this was spun up initially if you look at what the orchestrator is doing it initially would spin up a Bosch AWS Director and a NAD instance and then later on when it spun up the cluster it would hook this up to the NAD instance and give it the appropriate connectivity. And if you look at a kind of lower level view of the interaction between the orchestrator and Bosch initially the orchestrator would go and talk to Amazon with the credentials discover or provision the subnets. It would then with that information generate a manifest and it would spin up the Bosch Director. It would then talk, if you do a deploy it would then talk to Bosch which had a manifest that simply had the same, every node just was a simple bare bones OS with a DDK agent on it. Bosch then would spin things up. We would use, we would then talk to Bosch to get the node state and then directly we would talk to our agent and execute the workflow with our orchestrator. This is just a little more low level description of the same thing so I'm gonna skip the slide. And just in concluding what did we really learn and where we think there's room for improvement? So the plan to view Bosch as an element layer system and to integrate it with a service level orchestrator worked out really nicely the manifest that we had to generate the Bosch level was very easy to generate from the DSL that we had at the service level. The thing that we found challenging was that the only Bosch packages we built was for the DDK agent and this is an agent that we've had running for a while on many different operating systems and scripts that worked out of the box where when we poured it over to Bosch because the non-standard OS layout and things like temp had different permission rights we had to really hack up and build our own version of these scripts. So where we really think there could be benefit to Bosch is a set viewing Bosch as really providing what I think now they're really focusing on to us Bosch is really was one of the first systems that provided a mutable infrastructure. How do you get bits there? How do you rather than focus on incremental changes you just rip and replace? We think it did a real nice job of it. On the other hand we believe that the implementation that forces you to translate everything to Bosch packages is extremely cumbersome. So we only translated one or two packages if we translated all the big top packages that worked out of the box we think that would be a big challenge. We in the big talk project we have the benefit of all engineers who are working on one effectively golden store a set of packages and aligned configuration scripts. So if we then translated that and wrote porting we could write an importing tool if we port it over to Bosch you're leaving an area for making mistakes in that porting be much user if they need to support the standard Debian and RPM packages. And so simple ideas like if you wanna control the exactly what gets on a bit you use a standard enterprise topology where your package manager is behind the firewall and you just vet getting things in the firewall. So you have the package manager on your secure network you know exactly what's on it and you're using the standard app and RPM protocols to do that. We also think there's some very interesting opportunities too to view as people are starting to view containers that are artifacts. I know Bosch could treat containers like another infrastructure as a service but we think Bosch could also benefit by looking at a new type of build. What they're really building is containers building images as opposed to building a Bosch package that then gets installed. So that concludes the talk. I think Roman do you wanna wrap up talking a little bit more about big topics? Oh yeah sure, so thanks. So yeah so that was basically the extent of the prototype. It's a very recent work that we've just completed maybe a couple of weeks ago. So it is still at the prototype stage but what's interesting is it kinda sparked this interest in essentially orchestration frameworks coming to Big Top and trying to integrate with the way that Big Top manages the Hadoop ecosystem and by the way Big Top is sort of the underlying build and management system of the ODPI as well. So we now have Canonical's juju that is also trying to integrate with Big Top. There's the prototype that we've done and I think if you are finding yourself in an environment where you have to support application developers and data science team and let's say have some kind of a data lake architecture in place now is really a good time to get involved and you don't even have to formally join any of the foundations. You don't have to wait for your company to be a formal member of the ODPI or Cloud Foundry Foundation. Like I said, an interesting sort of model that we're trying with ODPI is that the governance of the individual projects, including Big Top is still done at the Apache Software Foundation. So all you need to do to basically start hacking on all of this stuff is to just join the Big Top project or send us an email on the mailing list and it's an Apache project. So again, as you probably know, Apache is extremely well optimized for individual developers collaboration. So there is no barrier of entry. Like all it takes is just an email to a mailing list or a JIRA that you can open. So I can just would like to leave you with that thought that again, if this is the kind of problem that you find yourself today struggling with or you see sort of your infrastructure going into that direction, there are extremely easy ways to get involved and just sort of get a message to us on the Big Top mailing list and we would be happy to collaborate with you. So with that, I guess we have about maybe seven minutes or whatever for a question and answers. So Rich and I would be more than happy to answer anything. So anybody, you know. So the question is, why are we trying to integrate with Bosch? So I think, Rich, you talked about some of the sort of interesting capabilities in Bosch that we really found useful. So even if it wasn't sort of for the Cloud Foundry integration exercise. So Rich, can you maybe talk a little bit again about those capabilities? Yeah, well I think part of the prototype was actually understanding if Bosch was suitable for what we did. So the initial thing that attracted us to Bosch was the fact that it gives you this infrastructure as a service abstraction and when that could be leveraged. So that was a thing. So, but yeah, there really was trade-offs there. So as I said, I think we really like to kind of their paradigm of doing the compilation. So I think it lends itself to us. If you look at what they're doing with Docker workloads where a lot of times you describe your configuration state and you try to compile things in, we think it lends itself to it. So it's overall workflow lends itself to that. And just to add a little bit of a big data, I guess, perspective, today there is not really a sort of easily sort of obtainable path to the multicloud if you're doing sort of big data, right? So there are different solutions from different vendors, but what we were looking for is something that could be robustly governed and Bosch being part of the Cloud Foundry foundation gives you that property, right? And again, maybe there will be a tool that would come out of, I don't know, Cloud Native Compute Foundation, right? And if at some point they do, again, that would be yet another choice. But for now, if you look at the landscape, there are vendor tools, there are tools, like the tools from Hushacorp. And that's basically it. I mean, there is not really anything that could reliably get you across the Clouds plus the data center sort of framework. Any other questions about ODPI, Bosch, anything? Right, well. Thank you so much. Thank you.