 Okay, we're back here live at theCUBE. This is SiliconANGLE.com. I'm John Furrier, the founder of SiliconANGLE.com, and we're inside theCUBE. My co-host is Dave Vellante, the founder of wikibond.org, our research team, putting out an amazing study on big data market size and revenues by vendor. You can go to wikibond.org slash big data. You'll see that report, and our next guest is actually on the report because they just filed a good public, Splunk, Sanjay Mehta, VP of Product Marketing. Is that your official title? That's correct. Okay, so you can talk about product, you can talk about go-to-market, do about all the customer activities. We won't go into, we know the stuff you can comment about, so we have friends that have public companies, so welcome to theCUBE. Thank you. First time in theCUBE. You guys are very successful, projected to have massive valuation going public. We reported on that already, but let's talk about how you guys, how your customers fell in love with your product. You guys came in, focused on a problem. So share with the folks out there first about Splunk. How well does the company actually, you've filed publics, you've been around for years, you've ventured back multiple rounds, highly successful, producing great revenue, but you really nailed the big data problem from a very specific problem that you solved that grew into a much bigger opportunity. Can you share with the folks, for the folks who don't know Splunk what that is? Absolutely, thanks for that. So Splunk's first product came out in 2006, and one of the things we really focused on was this problem of data. There was data's being generated from all these different servers, network devices, applications, and it was kind of an exhaust, and it stored traces of everything that was happening from customer behavior, to user behavior, to user transactions, to application behavior, et cetera, et cetera. And so what we did is we found a way to collect it in a way that's universal, so it doesn't discriminate from any of the different types or formats of the data, which is collected it in real time, indexed it, and then once you index the data, you're able to make it searchable, browserable, and then we have the ability to visualize the data and interact with it. And so what we did with the first couple of releases of the product, is we took some of the simplicity and speed from experiences that people were having with the internet search phenomena, and we took it to the enterprise, and we said, look, there's all of this data. Wouldn't it be great if you could access, interact with it the same speed in the same simple kind of way? And so that really gained traction, and the first kinds of use cases that we had were around IT operations, because it was really there that people had the problem. They had a lot of silos of islands of data and people and processes, and they needed ways to very quickly navigate that and get an understanding of what's happening, why is it happening, to meet compliance mandates, to address security issues, fraud issues, to keep service levels up, and so there's a tremendous number of different kinds of use cases. Yeah, Avi Meadow, I was just on earlier, said the killer app is tech search within real time on huge data sets on HTFS. Yeah, we've been doing that for some time. Yeah, you guys have the killer app, hence why you're doing so well, and the feedback we hear from our research, you guys are really, people love your product. But log files have always been around, and that's a big problem, you solve that. It's a nice little box you check off, but we've been talking with folks like Mike Olson and others that machine to machine data is huge. We had Nokia on earlier talking about that they have a Hadoop cluster to manage the Hadoop cluster because they want to make sure that they're using big data to make sure the big data job is running properly. Analyzing their analysis. So yeah, that's interesting. With the internet of things, you have more connected devices, hence more data or exhaust, as you put it. Is that really where the bulk of your solutions comes from, or do you see other markets emerging? Yeah, absolutely. One of the areas where we have seen a lot of success are industries and organizations that rely on technology because they have the issue. They need to get value out of their infrastructure. The infrastructure contains a lot of useful information, and we really want to give them the ability to harness that and access it. Certainly an area of additional area of growth is around, I mean, I wouldn't restrict it to IT infrastructures. I would certainly look at machine to machine, other sim cards in telematics as an example, manufacturing areas like that. At the end of the day, there's a problem that we're solving, which is how do you collect and harness all of this information in a simple way and then make sense of it? And I think that's where the opportunity is. And the other thing I just finished off with there is it's the focus for us, and the reason we did manage to get traction in the early days and continue to is around focusing on ease of use. And I think the challenge really for a lot of the technologies is how do you make the experience easy? How do you make it so that you can access the technology and actually use it in a way that's easy, simple, intuitive, and really just enables you to focus on the business problem you're trying to solve? So you can plug in and connect to any data source. That's correct. How do you do that and make it such that you can make sense out of that data? Explain that. So some of it was mentioned this morning, but there's a couple of core principles in a technology that deals with data that's variable in format, high velocity, and extremely diverse. And one of those things is you have to really not imply any kind of schema at the front end. You've got to allow data to be captured at source. Find ways to do certain kinds of analysis up front. You need to be able to, for example, extrapolate or understand a timestamp, understand where an event boundary is. But other than that, you really want to be able to pull the data in quickly and fast. And then once it's there, you want to provide the ability to actually start connecting the dots, starting to correlate extremely diverse data sources, because one of the things we've found is sometimes you have an intention of what you're interested in, but it's through exploration where you actually get to the real question that you want to ask. And so we don't really want to go down a path where you have to design something and then build it and it takes six months and then you invested millions and millions of dollars in resources to try and make it happen. To get a specific answer. Yeah, you want to iterate quickly. You want to connect to the data, provide the ability to iterate and understand the data extremely, extremely quickly, and then really get to the answers that you're looking for. Okay, so talk a little bit about what's in the middle so we can understand that. And then I want to understand what the outcomes are. So what's in the middle is, so at the front end, you've got to be able to collect everything. You've got to be able to do it in real time. That's the best standard, because although not all data arrives in real time, you want to provide that as the best standard that you can in terms of capturing the data. Once you capture it, as we capture it, we're indexing it. So what the indexing does is it enables you to very quickly search and navigate that information. And when you do that, you're able to make sense of the information and answer new questions, questions that you perhaps thought of that morning or that second, you can start asking those kinds of questions. The next thing you want to be able to do though as well is be able to start automating. You want to start being able to, if you do see something that's interesting, you want to start automating the detection of patterns in the data. You want to start being able to look across a vast scope of data, because you don't want to necessarily, at the front end, think about the fact that only these data sources are interesting. Perhaps there's other data sources that are interesting that you just haven't thought about yet. So you want to provide a very easy way to collect it and have it there so that you can then ask questions later on. And then you want to provide the ability to visualize that very, very quickly. Okay, so you're indexing at that point of capture. And then talk about that last piece. You said you want to visualize that. So what do the outcomes look like? I mean, what does a user actually see? So what we find is that the data is used, an example of use is that someone in the IT group, the data center is under fire. There's some issues happening in their virtualized environment, their hybrid cloud environment. They want to very quickly understand what's happening. And that person, as they're looking at the data, they realize there's other artifacts in there which are interesting and are useful to the business. For example, there's transactions as information about products. They're able to correlate what kinds of devices are interacting with their infrastructure. Is it an iPad, an iPhone, or some other smartphone? And so they're able to start informing the business with information that they never had before because it's information that's occurring all the time in real time. And so they want to be able to start analyzing what's happening at the time it's happening as opposed to some time later. So on top of the indexer you've got the ability to, we've got a very powerful search language. On the one hand it's extremely simple so we really try and make it accessible to the novice. Someone that's, you know, you don't have to be a data scientist to use Splunk. You can be someone that's just interested in getting answers and you can ask very, you can express questions in a very simple way. But in addition to that, there are extremely advanced and sophisticated capabilities for people that are looking to do very sophisticated things and extremely complex correlations, extremely sophisticated statistical analyses, those kinds of things as well. And then you provide the ability to visualize it. And so the visualization really makes the data accessible to technical audiences but also non-technical audiences. So it's interesting, you started in IT operations you said, right? So the state of the art there before Splunk, or things like Splunk was I could maybe put a bunch of inventory in a spreadsheet, you know, or maybe I had some. I did code to parse the log files. Or yeah, some. Yeah, it was homebrew. Yeah. Yeah, some tool to. Yeah, there's a collection of different products, homegrown scripts, a bunch of different. There was a bunch of VC back companies that attempted all those kind of, they try and get into it. But the data mix problem was very difficult back then. You had this scheme issues, right? So, yeah. Yeah, I mean it was a new class of data which didn't really suit the traditional data management, data warehousing world very well. You know, the characteristics of the data were fundamentally different, as well as the kinds of things people wanted to get from the data and the time element as well. There was the need to analyze historically and look at trends historically, which we certainly provide the capabilities for as well, but also to look at the real time aspects of exactly what's happening in my enterprise right now. So, what we're finding is with technologies like this is it's really changing the relationship between the CIO and the business, or the person in the IT or data center and the business because I have conversations with customers and some of the things they're saying is the business person will come up to me and ask me answers to questions because I can turn it around in less than a day. It doesn't take me three or four months to do. And so, there's a shift in dynamic which I think is really interesting. Before they would run away or do anything to avoid IT. So, what kind of questions I mean, for example? So, for example, if you look at a telecom company as an example, every time you make a phone call, a cool detail record generated from the switch and a cool detail record contains information about your number, the destination number and all the networks in between that you've gone through. Well, you can capture that as far as Splunk's concerned, it's time stamp machine data. So, you capture that in Splunk and Splunk has the ability to connect with data sources in other storage technologies like Hadoop or a relational database. So, we can pull information in like tariff information and in real time show charges as phones are being used. Particularly interesting is as you make a phone call, you go through other carriers networks and you want to understand what those inter-carrier charges look like. Well, something like Splunk can give you that information really quickly. And in fact, we have customers in the mobile space that use us for that because they could and they could do it very quickly. It's not that they can't do it with other products, it's just that it takes them six months and several million dollars and they can do it in Splunk in a couple of days if not a day. So, you can build the anatomy of the call and essentially the value chain of that call and then determine real time basically how maybe to reprice or how to turn a knob and increase demand. Yeah, and the other interesting thing about machine data as well is if you look at websites for example, there's certainly valuable information to understand where are people clicking, where are they going on the website. But that's really, that's actually quite shallow if you think about the fact that services being delivered are being delivered from media servers, content servers that are going across other technologies and eventually getting to a consumer who's consuming it. But wouldn't it be great to get a view across that entire value chain and understand exactly how's that value chain operating, how's it performing, what's the level of quality like and also understand what's sliced and diced how the service is being consumed by geography, by type of user. You can cross correlate all of this information with the data that's coming out of the machines and the applications and the data that exists in relational databases and warehouses as well. Okay, Sanjay, thanks for coming inside the queue but before we break, I want to ask you just kind of one final question. This is not going to really intrude onto the whole quiet period. I guess you're technically in a quiet period right with the file in the S1, right? Can't really talk about business. But you're in product marketing so you've got to look at the product, you've got to understand the go to market. As big data kind of grows up and is still growing up here, you guys are pioneering a lot of that work. Hugely successful on the revenue side. I agree, it's changing the relationship with the customer, creating more business value. What's the horizon look like for you guys in terms of possibilities without going into specific details? I mean, the big picture, it's not a, you know, we were talking earlier before you came on, we think this is a trillion dollar market, not just 50 billion, the dollars involved as the web becomes more connected with the big data opportunity. It's changing all verticals. The opportunity is connecting, it's connecting data to value. So the data is the asset. It's really harnessing and connecting it to value. And I think one of the things we're doing as an example is we have the ability to create apps on top of the Splunk engine. And what that really does is it creates an environment for innovation. You know, we don't want to pretend that all of the innovation exists in our organization. It actually exists out in the community, in our customer base. And so what we want to do is to really provide an environment where people can very quickly find uses for that data. And it never fails to amaze me how people use the data. And in our case, how people use Splunk. I mean, I was talking with Expedia earlier on and the gentleman from Expedia has very hardcore use cases in their environment, very cutting edge uses of their data terabytes a day. But at the same time, he has Splunk running at home for his own home project. And he's created an app that runs on it and it looks at a whole range of different social media sources and just some exciting stuff for his own. So I think that's also interesting. So for me- He's a developer. He's actually using data, creating value for himself. That's right, and I think for me, it's really about enabling the community- Bringing your work home, that's a great example. It's true, that's what it's about. Sanjay, thanks for your knowledge and sharing that with the crowd. And thanks for sharing what's going on with Splunk as much as you can. You guys are successful, congratulations. You're going to go public at it, clearly over a billion dollar valuation. Great stuff, congratulations. Another success story, just the beginning. I think once the dominoes fall and big data as people start making money, stock markets up, good things are happening, complete revitalization and innovation. So thank you very much. This is theCUBE and we would not be able to bring this great content to you.