 Everybody, we're back. This is Dave Vellante at wikibon.org and I'm here with my co-host, David Floyer. This is theCUBE, SiliconANGLE's continuous production of EMC World 2013. This is our fourth year here. This is day two for us. We are unpacking Viper. We're going deep with the technical folks within the EMC's Advanced Software Division. And we're going to have the business outcome as well. But we're here with Manuvir Das who is the Vice President of Engineering for the Advanced Software Division and with a focus on the object parts of the portfolio. Manuvir, welcome to theCUBE. Thank you, it's great to be here. Yeah, so object is an interesting topic, right? It's one that people get excited about. There was a lot of interest in the industry and then a lot of starts and stops, but nonetheless the object train keeps moving on. I mean, people believe it's the future of the business. Before we get into it, first of all I want to say congratulations on getting Viper out. Thank you. A lot of your work is yet to be done based on what you heard today from David Gouldin. Yes. But nonetheless the covers are off and now you're feeder to the fire and it's good, right? I mean it's all goodness. So talk a little bit about your role and then I want to really get into the object piece. Right, so I'm basically responsible for the object products of the EMC which includes the platforms that we already have which is Atmos and Centera which are being used fairly widely by our customers today as well as the data services in Viper which is object and HDFS that David Gouldin and Amitabh talked about. Yeah, so let's take it back. I mean, Centera's been around forever and of course post-NRON, it got a nice boost when people had to deal with compliance and things of that nature and so I was to say object's been around for a while but can we do a little object 101 for people that might not be familiar with it? This is a complicated situation for a lot of people who might be new to this business or the sort of Wall Street Journal CNBC crowd. So yeah, you got file, you got blog, you got object. Give us the object basics. Right, so I think a good reference point for object is actually file. So folks are familiar with file. Object is at a high level, it's very similar to file. The difference is basically that if you think about file systems, right, they were built with a certain envelope in mind as to directory structures, how many items you would have, what the sizes of these items would be. If you look at the semantics of files, it's very much about you do a sequence of operations where you go down the directory structure, you keep locking directories as you go down and so the whole notion of files is basically comes with certain limits to it as to how much you can do, how many things you can do. The whole idea of object was when things like video, et cetera, started to proliferate, right? What you found was firstly you had variety because you had really big things like gigabyte size things, terabyte size things and then you had large numbers of things like billions of objects in one people. If you think about in a file system putting billions of files in one directory, it's just not what you do, right? So object was really designed for a world of limitless scale. That is, I've got a bag of things I want to store, there is no deep structure to them, it's just one massive pool of things. I want to spread it across the globe, I want them to be always available, accessible from anywhere, I want to access them over the web, which means it's not about RPC and NFS and those kinds of interfaces, it's about rest and marshaling things through XML, right? So it's a different design point but at the end of the day, it's still your stuff, right? Can we talk about rest? I mean, that simplifies things, right? So again, for the audience that might not be familiar with it, we'll be talking about this big bag of stuff, you put it and you'll get it. There's basically two things you do, put and get and you do it, the simple way of thinking about a rest is just think HTTP, right? Instead of FTP or TCP, yeah, exactly. And it's just as you said, basically at the heart of it, there's two things, you put objects and you get objects, right? And all the details are about how you do this at scale, how you do large numbers of small things and small numbers of big things all at the same time, right? That's sort of the trend. So a lot of people think, you know, object is the future of storage. Why do people think that? And is that a valid thing to think? Yeah, I think the reason people say that is because the way object systems are designed, if you look at the semantics of object, they are quite relaxed compared to the semantics of file, okay? Firstly, with file, you have this assumption that many parties are going to write the data at the same time, you have contention, you random access into a file and write it at different points, right? So in the object world, you don't have that restriction. The other thing is in the file world, typically you expect really good performance, really low latency. And that's where a lot of the secret sources, if you look at our Isilon product, for instance, which is a scale out NAS, the trick really there is at some level of scale, how do I give you really low latency access to your data so your tier one applications can run? Object lives in a world of HTTP and the web where inherently the latencies you live with are much larger. You're not talking about single unit milliseconds, you're talking about 100 milliseconds. 200 milliseconds is quite acceptable because if you think about a website, say you're an eBay or a Facebook and you're accessing an image which is stored as an object one coast to another coast, you're already talking about 80 milliseconds or whatever round trip, right? So because you live in this world where the semantics are relaxed and the restrictions on latency, et cetera are weaker, you are potentially able to build systems that are much cheaper, right? And so that's really the value of object to provide you humongous amounts of storage at a much lower price point and a much simpler level of management complexity than you have with traditional files. So it's really the only way to deal with this massive amount of storage. It's really the only way to deal with, when you're talking about exabytes and zettabytes of data, yeah, that's the only way. But the disadvantage of that is that it restricts you into the types of application you can write. So it seems on the face of it that it's mainly archiving systems and backup systems and things like that. Is that correct or is it moving in more into the main stream? I'm happy to say that's not correct if you don't mind my saying so, right? So it certainly started that way, right? And that's how people saw it. And then over time as web applications like your e-bays and your Facebooks, et cetera, started using object, they're basically using it as a backing store for tier one applications, right? Now, what you have pointed out though is there's a significant issue here, right? And it's twofold. Number one, the performance of object in terms of latency, et cetera, does not match the performance of file. The second thing is think about traditional applications that have been written to access file data, right? And we know that most of the applications out there in the world today were written before the advent of object, right? So the whole trick here is how do I build a system which can give you two views at the same time? If you're building a next generation application and you want to see the object view, you get that. But if you've got a traditional application that wants to process file, you get that in the same place. So if I can give you the canonical scenario of that, right? Imagine I'm running a website where basically users upload their content, you know, home videos. And then I make them available to the community. But my job as the service provider is I have to curate these in some fashion. I have to normalize them to the same size. I have to put watermarks on. You know, I have to do filtering for unacceptable content, et cetera, right? And there's a lot of applications out there. The developers who wrote these applications are actually gone. And the way these applications work is you point it at a file share and they hit that set of files and do curation one after the other, right? Okay, so the kind of workflow you're talking about here is I have a website where users upload their content. That's a page on the web, right? It's HTML. And what I want is an object store, as a backing store, to upload all those videos in, okay? The next step I want to do to all those videos is I want to do this curation. But in order for doing that, I have to turn this set of objects into a file share with a set of files on an NFS device so that my existing applications can run and do the curation. And then after that's done, I'm going to convert this all back into objects and make it available on my website so that people can access. And if you look at even our customers who use our existing object systems today who have this kind of workflow, they literally have two storage silos. They have a storage silo where they have all the object data. That is where they ingest all the user content. Then they run these jobs where they copy all the data into the file store. They run the curation. When it's done, they copy it all back, right? Some people have actually written papers about languages they've invented to control this copy back and forth because it's so complex. And what we've done with Viper is we've said, you know, you just make one API call to Viper saying I want to view this object data as a file share. And we give you an endpoint that you can mount as an NFS endpoint and it's a file share, right? Now the beautiful thing about this, if I can take another minute, is there are certainly object systems out there that give you a file interface that is compatible with NFS. But the problem is all the data is actually going through the object head. There's just sort of a shim on top that makes it look like file, right? So it's really not performant. You can't actually sit there and curate 10,000 files overnight, right? Because of the approach we've taken in Viper where the object layer actually runs on top of a file array underneath. When we expose an object, a set of objects as a file share, it's actually exposed as a native endpoint on the device below that you connect to directly without coming through our object head at all. And so now when you've got a file share and you're accessing those files, you've got blazing performance, right? Whether it's an Icelon or VNX or whatever underneath, you're accessing that array directly. So you're solving the problem, the object problem of having to go and find that object in different places. You're forcing it to be as part of a file system. Yes, and so you get a lot of benefits of the file system, right? With respect to indexing and all. Now, having said that, again, because objects are designed for this geo-scale world where you have trillions of objects and billions of directories and things, at the object level, we have an index structure and a namespace that spreads across the globe, right? And that is beyond the scale limits of what a traditional file system would do. But then once you get the file view and you choose to get the file view, it looks like a traditional file system and all the file level indexing and all the things you do on files just kick in. Right, excellent. So that's why you combine the two together on the Icelon and the Icelon is the file system and then you put the object together. That's right. And I think I'll, Icelon actually fits very well because it is a scale out file system. So it pairs very naturally with Viper. Having said that, we run on actually any file array, right? So whether it's an Icelon or a VNX or even a NetApp device or it doesn't have to be an EMC array. If you've got an NFS endpoint, then Viper object will run on top of it. Okay, and from an object standpoint, you're talking HDFS, S3, and Centera, right? Yes. So talk about those choices a little bit. Yes. So S3 is, I guess more and more people have been using that kind of model, that kind of API model. So we have an API that's compatible with S3, primarily to make sure that the application ecosystem that has been developed already, investments people have made, they don't have to re-target that workload. It can apply directly. We support the Atmos API for the customers out there that are using our own Appmas product. We will support the Centera API. It's a slightly different model because the Centera model is what we call content addressable storage, where the path name of the object is actually an encoding of the data that's in the object itself, right? And that's done for compliance. So we support that model. So those are sort of the models where you think of put get object. And then HDFS is a slightly different model where it feels more like a file, right? And you do read and write the way you do files, but it's a pain only. So you don't do accesses within the middle of the file. It's not what you call the full blown POSIX semantics that you have with NFS. So it's the same data underneath, but HDFS is more like a file system model on objects. Yeah. Okay, and then you'll roll these out later this year, right? Second half of this year. That's right, that's right. So, yes, as soon as I'm done with this interview, you know what I'm doing, back to the code. So what's the reaction been from customers that you've talked to here at the show or otherwise? I think so, we've talked to a number and they're really excited. And I'll tell you a couple reasons why. Number one, I think we have a really clear roadmap of how we take you forward. And that means two things. For folks who have file storage today, how we bring you to the cloud world, like the Isilon customer. You put this layer on top and now you got both worlds. You can be file and you can be object. As well as for existing object customers, folks who are on Centera, Atmos, folks who are using public clouds like S3 or Microsoft's Azure, Viper object is compatible with all of that. And it's a layer that sits on top of that, right? So to give you an example, right? Viper object says, you put as many arrays as you want underneath me and I will spread your object set across those. And an array can be a physical array like an Isilon. It can be an object array like an Atmos. It can be your account in the public cloud, like your Amazon S3 account, right? So you can say, create me a bucket full of objects where part of it is on this on-prem array that I've got in my building. Part of it is in my Amazon S3 account. And you can set policy and our layer, our engine will automatically spread the data across. So you can see how this opens up hybrid tiering kind of scenarios. And to your question about the customer, what has resonated the most with people is anybody who's got a production system, right? And we have this experience also in a past life when we moved production customers to storage, they want an A-B model, right? You don't put all your eggs in the new basket on day one. So I want a model where here's all my existing data. On day one, just take my friends and family. The 100 people who won't care if the data gets lost and just redirect them to the new system. When that's done, I'm going to click the button to say move on. Then the next thing is go to 1%, 5%, 10%, right? So that's sort of built in to how we've designed Viper. And that's really resonated with the customers we've talked to because it gives them a sort of pragmatic path forward. They care about how do I get from A to B? Yes, in a reasonable way, right? Anybody who's migrated data knows it's not a fun job. So one last question for me. What about geographically distributed data? There seem to be a number of models out there, the erasure coding and things like that. Is that where you're going with this? Or is that too early to say? It's too early to be particularly specific, but here's what I would tell you, okay? Definitely built into our system is a geo-replication model that is asynchronous, which basically says, for every piece of data, your master copy is in a particular zone, and then our system takes care of making replicas in other zones so that in case you have an outage of a data center or a zone, your data is available from other places, right? Now, so that system is built in now under the covers of the next level of detail, you have different trade-off points as to how efficient you want to be in storage versus the latency, right? So if I do something like geo-erasure coding, which we do in our Atmos product today, which we could certainly do in Viper, that model basically says no single data center has a complete copy of the data. I actually take the data and I spread it across the sites, right? So it's, I mean, in computer science terms, it's the most efficient way of storing the data. On the other hand, whenever you want to retrieve the data, you are forced to make multiple hops across data centers to collapse that, right? So that's sort of the game you have to play, right? And I think the model we followed with Atmos is we've allowed the customer to set a policy model which says, I want to start with full replicas and then over time do this erasure coding in the background and spread me out. Because really data is hot data and cold data, right? And I think for hot data, you probably don't want to do the geo-erasure coding because you want fast response when there's an access. For really cold data, that's a better model, right? So Viper will support that full spectrum as we go. So a question I just got on Twitter is, so how are you protecting the data today? Right, that's a great question. So one of the benefits we have is that we are piggybacking on the arrays. And if you look at the arrays that we're running on, they provide a lot of protection out of the box. So we run an ISALON. And ISALON has an erasure coding model that goes across an ISALON cluster and basically does at an overhead of 1.3x which is very efficient. It gives you protection from this failure, node failure, et cetera, right? So that's a level of protection we sort of get for free from the arrays. And then at the object level itself, we have a production model based on replication where we make copies of the object as they come in. So we have a pretty good story for production here. All right, Manu Virat, thanks very much for coming on theCUBE. Again, congratulations on getting the platform out and unveiled and good luck with the rollout and really appreciate your time. Thank you, I appreciate the opportunity to come out here and talk to you. All right, excellent, again, more great data hanging with the key technical leads of the Advanced Software Division at EMC, Unpacking, Viper. We've got more to come, keep it right there. This is theCUBE, Silicon Angles Flagship, coverage of EMC World 2013. It will be right back. This is Dave Vellante with David Floyer. Keep it right there.