 Live from the Julia Morgan ballroom in San Francisco Extracting the signal from the noise. It's the queue covering structure 2015 Now your host George Gilbert This is George Gilbert. We are at the Julia Morgan ballroom in downtown San Francisco at the structure Conference we have a special guest with us Florian Libert who is the creator of mesos here or of mesos and the company mesos here and We This is sort of taking the world by storm Florian, why don't we start with quick correction my co-founder Ben Hintman actually is the co-creator of Apache mesos I worked on some of the systems that actually layer on top of mesos. Okay. All right. Good I don't want to take his credit We don't want to cause problems that you know after the show so One of the things we've been watching with increasing fascination is That we've got this situation that people think was mainstream today had to 2.0, you know yarn HDFS, that's our sort of distributed application platform and We feel like we're going towards a big data 3.0 That is not so constraining. Maybe you can shed some light on that. Absolutely So one of our focus points for this year was actually launching a system called Infinity and it's one of our latest product offerings that that works on top of our what we call distributed kernel Apache mesos and the surrounding DCOS which is our commercial offering Around mesos and Infinity actually gives you Gives you a number of components for IoT and big data and all of these components are actually open source so for example, well when you're thinking about IoT and Processing data in real time or near real time. It often starts with a web application that collects sensor data Right and once that sensor data is collected you generally have to put it on some sort of a bus or a queue So let's say you're using Kafka, which is a another great Apache project comes out of LinkedIn. It's horizontally scalable and Kafka then might be queried by something like Spark or maybe you're getting the data into into storm then you know You might fork the raw data into something like HDFS for later post processing But once once you have aggregate data, you might want to store it in a easily queried way So you put it into a no sequel data store like either react or Cassandra or something else maybe even a relational database if it's not if it's not huge data, so That's that's in a way That's in a way like an IoT pipeline, right? Everything from the sensor data all the way to where you store the data and then you can actually then you can write other applications that actually take this data displayed by a dashboard or trigger an event and It's It's actually pretty fascinating how quickly you can build these systems today And it actually reuses a bunch of these components that you mentioned right oftentimes Hadoop is still used for for data data cleansing or other other parts of the ETL process that happens afterwards and But a lot of these components that I mentioned like Spark and storm are not part of this traditional Hadoop 2.0 stack that you mentioned So let's dive into that on two sort of directions. Yeah Hadoop has a fair amount of administrative overhead and sort of beyond leading-edge companies leading-edge fortune, you know 100 companies and Web-scale companies this you know a skills gap It's a high overhead to administer. Yeah, but it's also from a developer's point of view, you know, it's not that easy either Yeah How does this new pipeline you're talking about simplify both on both those tracks? Yeah, so so firstly like one of the things I mentioned is a lot of these components are open source and what we've done is we've actually created a way to deploy this this entire pipeline in a couple of minutes and To your private cloud to your public cloud to your hybrid cloud, right? Because you can you can run the under under pinning layer this the mesos layer this distributed kernel Because it abstracts hardware again You can span it across different clouds so you can have a multicloud a hybrid cloud or or just that you're you're private on-prem cloud and with our system you literally can deploy this stuff in minutes and That's the reason why we created that was because when I used to work at Twitter and Airbnb It could take us literally weeks or even months to actually set up one of those systems like let's choose Kafka in a reliable way we oftentimes had to script around Recovering lost lost partitions of brokers because that's that's one of the one of the things you have to deal with right? It's failures and you have to anticipate failures because nowadays we are often running on commodity hardware when you're running on Amazon, especially your machines might be rebooted if they're at an arbitrary time and You have no no guarantees in fact the more you go towards commodity components the more you will You will see that there are failures and a lot of failures are happening So all this offer that you deploy has to be resilient to failures. Would it be fair to say that? yarn might be sort of a Resource allocator, but you're closer to an operating system in the sense that you can and not just spin up resources, but you know how to manage and recover and essentially As you're saying by abstracting Abstracting the hardware you're providing The capabilities that otherwise would have taken Administrative overhead is that a fair way of saying it? Yeah, I think I think that's a big part of it I mean, but but yarn for example can actually run really well on DCS and of course mesos We've created this project muriat Well together with companies like eBay and map are and muriat is actually a way of Running yarn and in fact you can run multiple versions of yarn on the same shared infrastructure via mesos and The one thing that mesos gives you on top of on top of any other system is that it actually allows you to programmatically access your entire data center resources, so compute storage networking and you can write new types of applications and It's really really interesting point is that spark was actually written as a sample app to show how powerful Apache mesos is and They were able to write the first version of spark literally in a couple of weeks And we all know now it's really touted to be the map reduce replacement right for In the in the Hadoop ecosystem and that was only possible because because of Apache mesos So, okay, let's let's key in on this this new analytic data pipeline What what overhead is there in terms of skills when you're just Deploying this on yarn, whatever the components you might want to choose to build this pipeline versus yarn on mesos What's you know, how much does that lift the burden burden so well? So you can install yarn on top of mesos with a single with a single command if you're using DCS We say DCS package install myriad which is again like our our wrapper around yarn to make it run really well on mesos Then you have yarn running and I mean yarn is really geared towards workloads in the in the data domain So you wouldn't use yarn and no no company would use yarn in order to run a long lift application Like your ruby on rails application, right like if you think about Twitter that they wouldn't use it Would never use yarn to to run a thousand ruby on rails applications Which might be the router elements that that route the traffic coming in to the respective Back-end services, but that's where you might use something like marathon which is proven at scale to for container orchestration and With this abstraction with this mesos abstraction You can run both of these domains both of these different use cases or workloads on the same On the same hardware and you can share that hardware and you can define with a policy Which applications have higher priority over other applications and that of course allows you now to express things When there's a failure which applications should be restarted and if you have if you're crunched on hardware Which applications shouldn't be restarted because yet they are just lower priority and could run later on So would it be fair to say that? You're gonna want to run yarn on mesos in the form married when you have a data-intensive Application like an analytic data pipeline But you'll also want mesos for the for the rest of the platform for the sort of other long long running Applications that might make up the broader application. Yeah, is that one? I think that's that's a that's a really good use case I mean, I think I think the data center is the new form factor And when you have a new form factor right like with a cell phone with a cell phone We had a new form factor what you needed was an operating system So mesos and DCOS on top of it Mesos is the kernel of the operating system and DCOS is the operating system for this new form factor That allows you to treat your entire data center like it's one big computer or in fact your hybrid cloud your your multi-cloud or Or a combination of all of them like one large pool of resources Okay, sort of like the the windows kernel and the windows operating exactly exactly, okay So how do you see applications evolving over the next couple years now that we have this sort of richer abstraction? You know above data center hardware itself Yes, I think I think we'll see a lot of new frameworks being developed and frameworks as I know an overloaded term But when I when I when I refer to a framework, I'm usually biased so I refer to a mesos or DCOS framework and I think we'll see more of the spark like systems in the future and That really like target certain for example machine learning domains like There will be there will be more systems that are geared towards graph processing towards certain algorithms that are still being Developed where the existing frameworks are just not sufficient and I think I think I don't even want to I don't even want to Pretend like I know what's going to what's going to happen there But I'm pretty sure like we're not at the end. We're not at the end of Innovation when it comes to these frameworks. I think there will be many many more It's like programming languages, right like even today. We're seeing still programming languages are super more than ever before Yeah, I mean still new new programming languages that are being developed and Some abstractions are being built into these programming languages that didn't exist before and that's that's kind of how how we innovate And how we make developers Developers and operators much more productive you bring up something which was we started talking about Abstraction of the hardware and simplification of administration and we've sort of pivoted up to the developer Focus yeah, would it be fair to say that? The idea that we were going to have sort of one ring to rule them all is you know, not likely to happen any time You know soon in other words spark looks like it handles, you know the the near real-time That it happens sequel interactive streaming graph processing and the wonderful thing about it is that the APIs are getting more and more integrated. Yeah, but it sounds like we really will go back to the Like traditional platform as a service where you can pick whatever framework you want Yeah, and you can wire them together as different services. Yeah, I think I think that's certainly happening already I think I think Again for me. I'm kind of this I have this biased view of a pass, right? Like when we pass is really around mostly around container orchestration and a system like spark is really more about about data pipelining and and and well near real-time querying of big data applications or even batch batch batch Processing of applications, but I think I think we're going to see many many more of these So as far as container orchestration, how might that change? how Azure or Google Cloud Platform or Amazon Web Services, how might that change how developers interact with those services? you know if if we had sort of more, you know widespread deployment of DCOS You know on those platforms I think the answer really is true portability right like you start developing on your local machine and you're getting a Mirror of this environment because that your environment is essentially constrained to this container You get a mirror of that and very similar or the exact same behavior ideally in an ideal world When you now run it on the cloud and then what you do is like the only difference Private cloud public cloud whatever whatever but the the only difference is Now with a single mouse click you once it's on the cloud you say hey scale this up to 10,000 containers and replicate it and I think this I think that's the real that's the that's the real Well, the real breakthrough of this is that it really makes the development process easier can that same simplification work for bringing the Efficient operational processes of the public cloud to private clouds Yeah, absolutely, and that again was one of the one of the big reasons why we actually introduced if you look at the history of how how may so evolved and Where it really was getting the initial push was that a Twitter where we decided look Twitter isn't is sort of like a communication is a communication infrastructure, right so Something like Amazon would have been cost prohibitive by the way a lot of people say Amazon really saves you a lot of money it can but Once you reach sufficient scale, I think the fact that they have these high-march and shows you that at scale you might be off better to to operate your own cloud and your own private cloud and Yeah, I think I think that's that's really where that's really where this technology shines It allows you to it allows you to replicate a lot of these capabilities that you have on Amazon without lock-in on Your on your private cloud on your hybrid cloud I think lock-in is actually a thing that's very often overlooked right because as you start developing against as you start developing against Higher level abstractions within for example the Amazon AWS stack You're stuck and then when you if you want to take advantage of them some of the awesome New hardware that for example Microsoft provides in the Azure cloud You couldn't take advantage of that if you locked into the Amazon world And that's why we're trying to create this layer of abstraction that the really frees you from lock-in But moreover, I think one of the other key points is let's say you're running your bank And you're running your your private data center Maybe you want to burst for certain workloads into one of the clouds and this simplifies. Yes, exactly Now you can just add those cloud resources to your on-prem data center and you can run And you can can burst and once you're done with your workload You can shut those two three thousand servers that you just started on Azure shut them back off and run in your in your on-prem cloud I think this is going to be a use case that we'll see more and more in the future Sounds like you're delivering on some of the original promise of VMware, but we're gonna have to leave it there This is George Gilbert. We're at the Julia Morgan ballroom in downtown, San Francisco at Structure 2015 we've had Florian Liber of Meso sphere on and this was a wonderful Interview with him. We'll be back in a couple minutes