 Live from Midtown Manhattan, the Cube's live coverage of Big Data NYC, a Silicon Angle Wikibon production. Made possible by Hortonworks, we do Hadoop. And when this goes, Hadoop is made invincible. And now your co-hosts, John Furrier and Dave Vellante. Okay, we're back here inside the Cube. This is live in New York City for Big Data NYC. This is where all the action is happening, we're on Big Data in New York City. We're watching Hadoop World, Stratocomps, a lot of action. I'm John Furrier, the founder of Silicon Angle, host of the Cube with my co-host Dave Vellante at Wikibon.org. Charmeela Mulligan is here, the co-founder, or founder, CEO, founder, co-founder? Founder and CEO. Founder, CEO of ClearStory. They launched here with their product, they launched last year. You announced, you were here last year. This year actually announced with the product. Welcome to the, again, back to the Cube. Great to have you. So last time you were in the Cube, we're kind of stealthy. You were good, you were smiling, like, I can't really tell you what we're gonna do, but it's gonna be big. But we talked generically about Big Data. So now that you're launched, you had your keynote today. Recap, ClearStory, product, detraction, a lot of interest, certainly in here in New York City, a lot of attention, what's the secret? All right, so if you listened to what we had to say last year at the Stratas very event, we talked about how it has to get easier to consume data from many disparate data sources, both internal and external. Some people might say that all data is gonna land in one data hub. I think we all know that that's gonna be very, probably not happen, right? There is going to be a lot of data sitting in the existing repositories that are not going away. There is going to be a lot of data circulating around the corporation in the form of FIKE files that are changing all time. There's going to be data obviously cropping up inside of your Hadoop clusters if you're deploying it. But add to that, almost every single large company is also consuming data from external sources. You're buying data from premium data providers. You're consuming data from public sources. And the number of external sources of data just keep increasing. As I mentioned last year, there are over 8,000 open data APIs out there, right? Seven years ago, they were less than a hundred. There are premium data providers that are providing data at a finer granularity and providing it at a higher frequency than ever before. This is, these are things like media and market intelligence data. Data around like your point of sale information. Data related to actual weather patterns from satellite to satellite. Pulling all this disparate data together from your internal sources and external sources so you can get to a broader insight is absolutely critical. Otherwise, you're gonna have all this data just trapped inside of repositories, trapped inside of these external data providers with no easy and fast way to actually converge it all and get to an answer. Data intelligence that we introduced here at Strata is all about removing the barrier to accessing internal and external data and making it possible to converge data on the fly as you pick data from a variety of different sources and arrive at a holistic view of what's going on in the business. Almost every single situation or analysis that one is trying to do today is about actually accessing disparate data from a variety of different sources and being able to get to this from source data to an answer has to get a lot faster. It cannot take weeks, it cannot take months, it cannot rely on data scientists for every time you want to do this and you can't expect that data experts are going to continue to iterate on an insight the way you as a line of business person need to see it. So literally data intelligence is about speeding the whole process of going from disparate data to a holistic view and then one other thing we think is very important and I talked about this last year is allowing more people to access and see the insight than ever before. Across the business, wherever you sit, you shouldn't need to have a technical skill set. We just talked to Amra Adala, a co-founder of Cloudera. We saw the future built with Cloudera, obviously now it's years later, it's now matured. One of the things we talked about was in this modern era, he used the digital SLR to iPhone as an example of, it still takes pictures, high-end functionality, but now the iPhone takes great pictures. So we talked about user experience and in the data science world, we were talking about about 200,000 data science out in the world today, roughly, guesstimation on that, but over two million analysts and X millions, zillions of end users. So the discussion was how do you get the data in the hands of the users? How does the user become the analyst? To quote Moneyball, who's that Billy Bean in the organization? And it doesn't have to be a geek writing Python or an analyst doing spreadsheets. It could be a user. What's your take on that and how does ClearStory get into that section of the market? Yeah, so first let me actually complete that last number that you said was X million users that need to get to all this data. There are actually 650 million information workers out there across corporations. By next year, we're looking at that, reaching about 800 million people. 800 million people across business units, across companies that don't have... These are connected workers. These are connected workers that rely on data every day to make a decision. Whether you're in the finance group, the marketing group, the digital marketing group, the customer service group, customer solutions group, you're looking for an answer that is relying on data. There are, like you said, relative to that, a very small portion of data scientists and data architects. So we need to find a way to put data into the hands of 800 plus million people where it becomes very easy and intuitive to understand what the insight means. And that's your target market. And that is what we're focused on. So our target market is getting to the people who need to be able to glean insights faster and can understand the insight by looking at a application that is highly intuitive in terms of how answers are delivered. You can go from a question to an answer within minutes, right? You can go from an answer to iterating on an answer within minutes. The other key thing we've done to facilitate this and help all those millions of people out there who need to get to answers is nobody's working in an assilo, as you know, John. I mean, everybody is working with the other folks in their business group that peers up and down the organization, people in different locations. So allowing them to actively collaborate on the insight and reach a conclusion and having lots of people in real time or people coming in after the insight has already been published to actually have an opinion on what it means. And to contribute additional thinking on it so you can iterate on that is very key. We've built into ClearStory not only the ability to speed access to internal and external sources, converge data on the fly to reach a holistic view. We've also built in the whole notion of data-aware collaboration into the solution. And data-aware collaboration is truly about letting lots and lots of people look at an insight as an insight unfolds and letting people actually conclude and make decisions and move on to then look at the next insight and then make the next decision around the next insight. So that active way of collaborating where you maintain the state of the data, the view of the data, everything has a notion of like who gets to see what and what do they get to do? What kind of decision do they get to make? All built into the system. So I love the story on ClearStory. It is crystal clear. Your messaging is really crisp and spot on. So you've essentially automated the pipeline from going from source data to an answer. That's exactly right. It sounds like magic. How do you do it? Talk about what's behind it. Okay, so behind the ClearStory, so our ClearStory is an integrated data processing engine and a application. We set above data sources. So we are sitting on top of your corporate repositories and in front of external sources of data. We bring data together from many different sources inside of our platform. And the first thing that happens is that the system determines or infers what's in the data. So we understand the source. We understand the semantics. There's an ontology that is built inside of the system and we use that information to determine how to converge multiple disparate sources together. That whole process of converging multiple disparate sources is what we call intelligent data harmonization. And that's one of the other key technology pieces we announced when we unveiled ClearStory this week in New York. But intelligent data harmonization is what the system is doing to determine how to bring disparate sources together and what can be actually converged into a single view. Inside our platform we actually use a scale out in memory distributed processing framework built on top of a technology called Spark. That cloud are also announced support for this week. We have been innovating around Spark for several years now. And that's basically where the converge data lives. So this makes it possible to have these converge data units that have been harmonized, basically living in memory and they give you the ability to iterate on them very quickly and navigate through insights quickly. In front of this ClearStory platform that does data inference, data harmonization and processes all this in memory is an application layer that we have invested a lot in to make it very simple, easy and intuitive for a user to actually build what we call a data story. A data story is a living analysis and then be able to interact with that data story through visual insights, so very snappy visualizations. Once you're in a data story, you can now collaborate with lots of people. You can bring 10 people into the data story, 50 people into the data story, they can all see what you're seeing, they can change triggers on the inside itself and by doing this you're allowing people to reach conclusions together and make a decision. So you've got this low latency environment with Spark and this intelligent data harmonization. So are you using a combination of sort of proven algorithms, public algorithms and other technologies? So we have actually built the data inference engine ourselves to infer what's in the source, how fast is that source updating, what are the semantics of the data? That's actually something that ClearStory has been building on for the last couple years. The data harmonization is again a technology that we have built. In fact, the whole ClearStory stack has like seven patents pending. All the way from the platform component to the application component and the data harmonization again is something that we built. Where we're using Spark is that when we harmonize data, it results in an in-memory data unit and that data unit is blending data from multiple sources. That data unit is a living data unit that's constantly being updated with fresh data from the source. That is the data unit that gets distributed across the Spark layer and on top of Spark, we built our own IP that allows basically the user signal from the application to get pushed down into our backend platform. So every time a user or an information worker is clicking on an insight, trying to see something else, navigating, trying to augment more data in, it sends a signal back to the backend and the backend is then basically pulling more data from the source, converging more data using a harmonization engine, generating a visualization and that is basically what creates the constant cycle of user signal back to source data. We call this a rapid round trip. So are customers building an application on top of this platform or is this the app? This is the app, right? This is the end user application with all the capabilities to iterate and drive through analysis. So it's interesting, we was talking to another, I mean, there's a lack of really good big data apps out there off the shelf and we're waiting for this market to explode and this is one of the catalysts. One of the other application providers said to us, you know, I don't really need, I got my own data, you know, space, I've got my own security, I've got my own algorithms are free and so I'm building that and essentially that's what's needed. It's a combination of technology, got domain experts and data science. Is that how you look at it? Yeah, we actually look at it, you know, there's a combination of bringing together data experts, data science with the people who are the line of business users that need to see insights. The problem we've had in the last, like 15, 20 years of analytics analysis solutions is there's an impedance mismatch from the source data that's changing all the time to the view that the user is looking at, right? The impedance mismatch is caused by the fact that we have data platforms and external source of data and then a front end tool that is nothing but a visualization tool, right? Which means that every single time a user needs to get more data or converge more data, you have to go back through this data pipeline and rely on people and modeling and ETL and all of all these steps to actually go from the freshest source data to the insight as well. We, by having an integrated stack that's a platform with an integrated application that's not replacing your data sources or data repositories, it's the processing tier with an integrated application. We are basically removing that entire impedance mismatch from source data to the insight that you need to get to. And what about the Viz? It's your visualization, right? It's our visualizations. As you will see on ClareStoryData.com that we unveiled, you'll see a lot of what the front end user experience looks like, but what we've done in the area of visualizations is what we call smart visualizations because we understand the source of the data, the shape of the data, the size of the data, we infer through our inference engine the semantics in the data. We present to the end user the visualization that's the best fit visualization for the nature of the data that you're looking at, right? So we again, use everything from the inference and the harmonization that we do under the hood here to drive the right visualization for the person viewing it at that point in time. If you layer more data in, the visualization changes because now your blended data unit has other data in it. As you drop data, the visualization changes because now there's another view that's more appropriate to it. While you're doing all this, you can collaborate with everyone across the company who are part of this whole experience. I need one of these, John. You do it. Try it's very easy. Kindergarten proof. It is, it is. The trend of having a platform with an integrated app is really what people are doing right now because people can build platforms. I mean, we're doing it with crowd spots as people in Silicon Valley know and we talked about. I just want to end because we're on pressing on time. I want to ask you, first of all, the keynote was very much a product demo. You get a chance to go into some of the concepts here on theCUBE in depth. Thank you for that. It's awesome. Love the vision. Love the storytelling concept. Data stories, great name. Just excellent, excellent branding overall. Congratulations. So let's talk quickly about to end on what's the traction like? So talk about how you had a year and knowing you and your team, you're looking at the landscape. Now you're public with the product and you own a market. Talk about the traction, talk about what's happening here in New York City, customers you talk to and just overall the clear story traction. Yeah, so we work with a lot of companies around in the CPG space that are actually bringing in data from the internal sources as well as they all rely on many external sources of data. They get point of sale data every day. They get the campaign data every day. They're looking at all the information store by store to see what factors are affecting pull through of the product and sell through for product. So we are heavy in CPG. We're heavy in the food and beverage market as well for the same reasons, lots of different locations, lots of different brands, lots of different product sell through depending on where you are, the demographics in that area, the weather conditions in that area, all kinds of factors that contribute to it. We also work heavily in the financial services area around areas like research, wealth management, areas where you're pulling in lots of data from many different external and internal sources. We are also working with national retail and the reason we're focused on national retail is because again, based on the location of every store and their online e-commerce site, there's a lot of different trends that are going on based on the local market conditions. So we get pulled into a lot of cases where there is a very different trends based on geographies and location to location. So typical cases include things like local market analysis, right? Whether you're a retailer, whether you're a grocery store, you are basically looking at all kinds. Anyone who in a modern era is connected to the network as a user with the user interface, whether it's a phone or device, the tablet or what not, tablets obviously. Median entertainment, right? Median entertainment is a big one for us as well because people are now being influenced by what other people are saying. Onscreen graphics, social data. Social data, so consumer-generated reviews being an external source of data is having more influence on what movies people go to and what shows they watch and how much advertisers should spend in their TV spots versus their online advertising based on what consumers are saying and doing on a day-to-day basis, right? All of these data sources we tap into and we let you bring in with your own data. Okay, so real quick, just talk about just how you're different from the competition. What's your differentiator that you're going to work with? So big differentiators, we've truly delivered on a product that has the ability for a line of business user to pick it up and start using it and start seeing insights. It is an incredibly simple guided user experience going from many sources of data to basically a collaborative way of arriving at insights. The other aspect to fit is external data readily available in an area of the product called data. Very relevant. Obviously, storytelling's big in the whole social web now. Obviously, we were just talking about data artistry earlier on about how data can be an art and a science. This puts it in the hands of the user. Great stuff. I love the positioning. Congratulations. Clear story, very clear about their product and now their concepts and methodology behind it. So the folks on Twitter who wanted to learn more and we have that interview for you'll be on YouTube, on Demand, Jamila Mulligan, the CEO of Clear Story. We'll be back after the short break with the president of Hortonworks and more great content here on theCUBE. A big data NYC, a Duke World Stratoconference, all here, wall-to-wall coverage. That's theCUBE, we'll be right back.