 from Las Vegas, it's theCUBE covering. EMC World 2016, brought to you by EMC. Now, here are your hosts, John Furrier and Dave Vellante. Hey, welcome back everyone. We are here live at EMC World 2016. This is SiliconANGLE Media's theCUBE. It's our flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, my co-host Dave Vellante. Our next guest is Jeroen van Rotterden, CTO of EMC Enterprise, Content Division, all the greats we've heard today, the big announcements. Welcome to theCUBE. Thanks for having me. So, big announcements. You've got Horizon now called LEAP, InfoArchive's been announced. 4.0, yes. What's the story with this? Because we've been talking earlier about everybody's on this digital transformation. But you have an old way and new way going on, kind of like traditional, even on the keynotes and then emerging. Everyone wants to digitize everything. Not everyone has everything in the format. They might have systems and software, collaborative enterprise software, but the shift to this new digital, asset-driven content market, non-linear consumption, and so on and so forth. What's the big underpinning? It's an exciting world. And in the world of content, what we do with document, and I think is relatively unique, we are going for a very optimized hybrid cloud strategy. A lot of our customers are very conservative. They have a very large-scale document and installation on-premise. They invest a lot of dollars in that, and it's optimizing their business processes. But they want to have the agility of the cloud. And so, we announced LEAP with a bunch of productivity apps in the cloud, where you get that agility that is complementary to our on-premise document and installed base. Yeah. So, they have pre-existing investments, so they don't want to rip and replace. No, no. But they also want that benefit of a clean sheet of paper to be cloud-native. That's kind of what you're getting at. Yep, that's what we're saying. And so, the two work better together, as we said, right? High velocity agility in the cloud versus a stable base, your system of record on-prem. So, we hear all this all the time, all the different shows we go to. Systems of record, that's the database, some engagement, this new interaction, relationship with the customers, and now insight or cognitive or intelligent data content. It's slightly different, though, with documentum, right? So, documentum is, it's not a database, it's the heart of the process flow around content. So, the business processes are optimized using documentum, and that's the system of record, with a lot of process and integrations around that. Now, we're adding agility with cloud, a system of engagement, basically, right? So, productivity apps, and our approach is different as well. Instead of going for a monolithic app in the cloud, we go for slivers of functionality that are highly optimized for a specific purpose. Good example is review and approve on a mobile device, and that optimized that in a single purpose app with, you know, very rapid cadence of innovation. How? So, these new work streams, basically, what you're saying. So, the old way was, you had a database, and the old, let's just use data as an example, a schema defines the function, and content or software will define what you can do. What you're saying is, they're defining their workflows or streams around the content, or the content dictates. No, it's really, the business process is typically leading, and frankly, there are a lot of business processes that cross the enterprise boundary as well. So, what we do with Leap, we have a set of productivity apps where you can invite external users to contribute content to. Outside the company. Outside of the company, right? To submit documents in a very controlled way, and then integrate back into your base business process. Are we going to talk about InfoArchive today as well? No. Are we? Of course. Well, so, I want to ask you about the business process. So, how do you customers deal with the hardening of that process, and how does InfoArchive and other tooling evolve with that? So, InfoArchive is a better to get a story as well, right? So, InfoArchive is an optimized platform for static content and data, structured and unstructured data. And you have a set of Leap productivity apps in the cloud. So, for instance, a snap for a distributed capture of data. You're capturing static content, like images, or you have Courier, where you capture static documents like contracts. That content is not changing. So, flowing that into InfoArchive, where it's a highly optimized platform at scale, petabyte scale for that static data, that's a better to get a story. And sometimes you got documentum customers where they have a lot of business processes around documents. They modify a lot of documents, but they reach a static state. Offloading the static content into InfoArchive is a better to get a story as well. So, okay, so you got scale. How heavy is it? What's the footprint like? So, InfoArchive is a very lightweight architecture. We actually redesigned the architecture to, first of all, scale out. So, every aspect of that architecture is horizontal scalable. It's a cluster architecture in every layer of this stack. So, it's extremely lightweight. We demonstrated today in the keynote on a Raspberry Pi, a $35 computer, with the full InfoArchive solution running on top of it with all the three servers. So, very lightweight, but scales out. And the reason why lightweight is so important is there are new privacy regulations popping up all over the place in Europe and across the world that forces big global enterprises to have a local archive in the country, geo-fenced. So, you cannot have this global archive in one, two or three sites anymore. Certainly, you have to have 50 instances of your archive. And rolling out the traditional archiving solutions in 50 locations is super expensive. So, you got to make it smart, too. It's got to be intelligent, too, to know where the geo-fencers, too, right? It is, it is. So, we reroute the data to the right archive endpoint, but doing that in a very cost-effective way at a global scale with so many instances, that's what InfoArchive does. And you said it scales the petabytes? That's good, that's beautiful. Oh, yeah, you know, we've got customers now that they have, I'm a good example, is a customer with 40 billion objects, structured data in a single archive, four billion documents, and that runs on a three-core machine with 60,000 queries per day against the data because that data is now actively used. And, you know, it's very, it's not uncommon that a big financial institute has like 50 petabytes of email alone. And in a dot-franc scenario, now you have to collect all the transactions, all the communications, like email, voice recordings, call records, IAM. Social media data, all together in a single archive. Yeah, and that's a mix of unstructured data and structured data too, right? That's a strength of our platform. We optimize for both. And your pricing model, how do you price the capacity? So that's the beauty, right? So InfoArchive is based on consumption-based pricing. So it's terabyte-based. So when you get the benefit of the archive, we charge you. And so, you know, a lot of customers, it's a land and expand model, right? A lot of customers start small. They get an immediate return of that small instance, and then they see the light, right? So they add more and more data streams to the archive. How unique is this in the marketplace? Can you compare and contrast with other products? Yeah, I think we are unique in terms of the scale, the lightweight of the architecture, the fact that we do in-place compliance. So we have a very strong compliance engine. A lot of our competing products, when compliance controls change or rules change, they have to refeed the data. Or, well, refeed the data means that you may have to export the data out of the archive and bring it back in. And re-index. Oh, yeah, it's horrible, right? So your retention policies change. That's the last thing you want to do. Yeah. So what we do is we do in-place compliance. What that means is that our legal holds, retention policies and security policies around the data are, have an interaction model against the repository. So when retention policies change and they will change, you don't have to change the touch of data. And that's a virtual repository as you described before, basically, right? So what do you mean by a virtual repository? Well, so you said that you have to have in-country. Yeah, so it's a distributed repository. Not a single physical shove-it-in-a-box repository, right? No, it's not. Absolutely not. That's what I mean by virtual. Okay, so back to the things for customers. What are you seeing in the conversations you talk to customers, the top three features that they like the most? So there's always a compliance aspect to this. So the fact that we are so flexible around compliance controls is a big one. Second is scale. All of our customers have silos of archives. It's very expensive and it's actually growing and growing. Third is that we have specific solutions on top of the core platform. For instance, for SAP archiving or clinical archiving, so EMR data, that really reduces the cost of implementing a large-scale archive in a compliant way. Okay, so I got to ask you the question that we're seeing a lot of conversations on them, interviews that we do, and our phone experiences around data. Better Together is definitely a great message and that's something that we're hearing from everybody. So taking it to the next level of data silos, because right now people have three package apps because of either it's a one-off point solution, and they have a database, but the data now talking to each other becomes interesting because you have omni-channel-like interactions, engagement data coming in from whether it's lightweight, cloud, or purpose-built for geofence, like you mentioned. So all this stuff's going on out there on the database, but yet it might be really critical to have real-time information talk to another system or app within the framework. How do you guys look at that and what's your vision around this data interaction, data sharing? So in terms of bringing data together, we definitely have a strategy to be the central archive for many different applications, and these applications can do multi-channel publishing on their own, right? And specifically in FinServe, right, you have to cut across all the data from all applications to get inside when you have an audit, for instance, right? The second what we see now is that we went from being an archived skill with a compliance engine, where we added very flexible discovery interfaces that you can configure using drag and drop in the browser. We now see a need to actually do real-time analytics over the data. So we're investing now, it's the next round, we're investing in real-time analytics over data streams coming into the archive before you collect the evidence in that central archive. So like machine learning or some sort of algorithmic insight, the surface insights and then... Yeah, we're actually presenting that, so it's not just machine learning, it's a combination of NLP as well as statistical analysis and machine learning, and we're doing that over structured and unstructured data. And then new techniques to do machine learning over unstructured data that are really interesting. And how does this relate? So we talk about scale a lot. One of the complaints that customers would have frequently is the technology might scale, but the model didn't because I couldn't classify my data at the point of creation or use. So what is the status of that sort of capability? I think manual classification is a thing from the past. The volumes are too high. The data models are changing on the fly. We see sort of a move towards control by the end user. So end user start to, through configuration, start to define dynamic data models. That means your data model is changing on the fly, and therefore your classification will have to change on the fly as well. And so discrete taxonomy is defined by hand, but just not doable anymore, right? And the machine learning algorithms are so good now. And the entity recognitions or core NLP, the NLP type of algorithms are so good that you can really automate that. With a very high degree of sort of. The problem is essentially solved. It's being solved. It's still early days, but it's being solved. Okay, but that's the technique to solve it now, is NLP and machine learning and. Yep. There's animation too. You see bots out there. Because there were some attempts to use math, support vector machines, late semantic indexing. Was there another technique? I'm very excited about the opportunity with vectorization of unstructured data. Which is very well understood. It's been around for decades, right? No, it's actually not. So if you, there's. I mean, the concept has. The math has been around. But now the data sets are different in real time, so you got unstructured data. But the implementation hasn't been around, right? No, well it hasn't. So, you know, if you look at technologies like Word2Vac or Glove from Stanford, right? So. Yeah, yeah, okay. These technologies are only available, you know, less than two years now. Yeah, okay. 2014 or 2013 was the first paper for vectorization of text. And the interesting technology is there that it doesn't do just statistical analysis on how words in a context are related. It actually can do automatic reasoning over the data as well. Which is still poorly understood. So, I think that. But meta-reasoning is a great concept because if you can automate that, you're facilitating some complexity inside the discovery piece around. Right, and so tied back to InfoArchive, right? So, we're getting really, really popular in the financial services space. They all have the compliance requirements to archive all that data and correlate that. They all have the requirements to do the discovery over this massive amount of data. And then there's fraud detection by auditors, but they really want to do fraud prevention. Yeah, right. So, the name of the game is We Want to Prevent Fraud. Well, then you just throw security on that too. It's even more complicated. But there's a game changer. We have a good handle on the security controls over that data. But this is a game changer, because for a long time, the industry was using search as a blunt instrument. Yeah, but it's not good enough. It's not good enough. Right. There's still, you know, lots and lots of manual auditors or, you know, surveillance going on within the FinServe industry. True, and I'd like you to, first of all, thank you for coming on theCUBE, sharing your insights here. My pleasure. And the machine learning machine here inside theCUBE. But I want you to take a minute and share with the folks watching what is the insights coming out of this show for you guys? Because you guys have a lot of announcements. Break it down for the potential customers watching. What's the big message? What are you guys proposing here? And how do they get more information? You know, I think our big message is that we're setting the pace for the next generation of enterprise content management. We think we have a very solid hybrid cloud strategy with a combination of on-premise and SaaS technologies with the agility they need. And so I think we're unique in sort of defining the next strategy for content management, enterprise content management. And with that, on the archive side, we're really redefining on how data comes together, how you get insights into data, whether it's through analytics tools like MapReduce over a secure environment, or moving into real-time analytics. And it's more than just marketing cloud software, it's really infrastructure-related software. It's both, right? It's not just infrastructure. This is business process and content-centric and data-centric. What value does it bring? Content is king and we are bringing you a lot of content here in theCUBE and making the content smart, addressable, non-linear consumption. This is the cloud, it's agile, it's DevOps. All this is great stuff in the digital transformation, the effort to digitize everything. Thank you so much, Drew, for sharing your insight and content on theCUBE, which is a content machine. We'll be right back with more action live here in Las Vegas at EMC World 2016. I'm John Furrier with Dave Vellante. You're watching theCUBE. It's always fun to come back to theCUBE because, you know, the...