 Okay, we're back in New York City for Big Data Week. This is SiliconANGLE.com's coverage of strata plus the dupe world. This is theCUBE. It's our flagship program. We go out to the events, extract the signal from the noise, and all the actions happening here in New York City for Big Data Week. Earlier in the week, we were in Las Vegas for IBM's information on demand, and we got the Big Data world covered like a blanket on SiliconANGLE.com and wikibon.org. I'm John Furrier, the founder of SiliconANGLE. I'm joined today with my co-host. I'm Dave Vellante. Thank you. Great to be here. Yeah, here at Strata, great. We just came off of two days at IBM IOD, dropped in here. Awesome. Innovation. We're geeking out on databases. We just talked to a CTO at CyBase, geeking out on concurrency and all kinds of stuff. So it's hot. The technology space for databases and data is hot. So I want to get my first question. What's your take on all this? Like, as a tech guy who's been in the business, what do you see out there? What's the big enabling technology? Well, I think the enabling technologies that we've seen as we started our company is really all around using the cloud first for the economy of scale, the ability to scale out as you grow a startup. You're not sure how fast it's going to grow and when it's going to grow. So I think that ability to be reacting quickly as your market grows is very important to us and using your money wisely. And then on top of that, putting in the data analytics of something like a Hadoop stack to be able to process billions of records in a very short amount of time. There's gigantic quantities of data, much more than we've seen historically as data warehouse and database types of people. So the ability to take in billions of records, get your hands around that data and understand it and then turn it into something meaningful and useful for our business customers is what's exciting to us. Let's talk about your company, PlaceIQ. So give the folks out there a taste of the business and the company and what some of the under the hood type configurations that's driving your business in terms of load and architecture. Sure. So PlaceIQ is mobile audience targeting an insight. So basically we take information that is streaming off all kinds of sensors, people with mobile phones, smartphones, GPS devices, business data that has sensors on it. So it's massive quantities of information. It's very specific to location and time, billions of records, terabytes, not petabytes of data. And we put this together into a database, merge it, cleanse it, put it into 100 meter tiles and then use that information to be able to identify in a given location who is the person who's there and what's their most likely behavior? What are they probably doing right now or what are they most interested in doing right now? So with that detailed map of the world, like no one's ever seen, we're able to help brands connect with their customers in ways they've never been able to do before. And you have a lot of investors behind you, some big names. How much did you guys raise? So we're still kind of closed-lipped about that. We had a series A recently with US Venture Partners, but we choose to keep the amount quiet. There's a lot of names in there. Must've been a big round. You know, it's a decent size, but I mean, in terms, we're very efficient. We can use a very small round and use a very efficient one. So obviously, mobile, we had one guy on theCUBE said that the iPhones, the NSA's wet dream in terms of what they used to pay for sensors data. Now you have mobility, obviously it's a big thing, all kinds of fraud detection applications. So you're in that kind of business. What are the big things that you're seeing enabling in that marketplace? What are the top things that you're seeing for use cases? And also, what's the data volume like? What's the data flow look like? Yeah, so we're in ad tech, right? And just to give you an idea of the scale there, people serving ads to mobile phones and kind of reaching out to their customers, our partners that we work with there are doing billion records a day type of volumes, right? So this is huge volumes. And we have to ingest that information quickly, be able to bring it online and make sense of it, and then turn it around in a way that it can be consumed and used to provide a better experience for customers on their phones. Okay, so you're bringing in a lot of diverse data sets, I presume, performance, reliability, stability, these are all proof that the network can't go down, right? Right. So how do you architect as a CTO, how do you architect that problem, give us a little insight? Right. So we use two pieces to our stack. First is Hadoop, where we have horizontal scaling and we can build multiple clusters. So we have the data copied into multiple locations. That way we are redundant and fault tolerant. And that's in itself a big issue. If you have billions of records and petabytes of data duplicating it to try to be more consistent and reliable was a costly activity 10 years ago. The costs have gone way down. So we really use very simple techniques, but they allow us to have this information online and ready and accessible real time. We also use in memory analytics. So a second piece to our equation is loading this data into memory only when we need it. So the idea is we stream the data down, look at a subset of it in memory, and then build our models against that. So that's where we're bringing in sort of a real time spin up of an in memory analytics solution from Cognitio and bringing our data into the system real time. So in memory, it's got a lot of attention lately. Some of the big guys like SAP, obviously marketing there and people invented that. So that's all good. So we could actually go on. So to talk about Cognitio, the tech behind it, how it's different from some of those. Right, technology. So what's interesting and what attracted us to Cognitio was that they were working with Amazon. So we could actually deploy a high performance compute cluster in the cloud, which again to a startup was very important. We didn't have to buy the hardware, scale it up ourselves and manage that infrastructure. So we can turn on an HPC cluster from Amazon get somewhere around 24, 32 nodes or CPUs working for us in a highly high bandwidth, high network environment and then have Cognitio stream the data into memory and do the processing we need to do and then shut it down and go away. So that allows us to be very cost efficient because we don't need to do these jobs 24 by seven. Generally these jobs are something where we prepare the data, get it ready, load it into the system, do our learnings and then go away and then publish our data and make it useful to our customers. Okay, so can you talk about the diversity of the data that you're dealing with? How are you architecting for that piece? Maybe talk a little bit about the data sources. Sure, and that's another area where it's interesting to use to do for a no SQL type of solution because we're getting new and different data sources every day. So we do everything from basic cell phone trace data, just looking at where is a phone just going through space and time. We get search results, hyper local search results, application usage, so what types of apps are people using and how are they using them? Event data, so we also bring in listings that are maybe not real time but you've got sort of the community crowdsourced event listing sites and things like that. So people out there in their mobile phones are starting to annotate everything going on in the world. They're kind of our reporters out there, right? So we grab all this data. It's very diverse. We have to bring it onto the system and then kind of merge it together into our models for analytics. So using a no SQL solution, we don't have to struggle with building the schema, morphing the schema every time a new dataset comes in. We basically use a document based approach. Data has attributes and as new data comes in we just figure out how to blend that into the system. So it's very fast to bring a new data source on and include it in our pipelines. Okay, and then, so you take all this data in, you do the analysis, and then you're providing context to that data and then what happens from there? Take us through the sort of value flow and how it gets into your customer's hands. So we'll take these streams of data, we cleanse them and make sure they're accurate. That's one thing. So some of this data may not be location accurate. We have analytics that helps us understand that and make sure that the information we're looking at is really about the location we're thinking about. So there's a geospatial issue there. We then also have this data coming in from multiple sources. We have to deduplicate it, right? If multiple people are blogging or talking about the same event, I want to make sure it only becomes one event in my system so we kind of merge these data sets together, understand everybody's talking about the same thing from these multiple feeds. And then when you put it through a taxonomy, we've got a 4,000 element taxonomy where we classify the data into attributes of a location that we think are valuable and interesting. Once you have that data processed down into that taxonomy, that's where our data scientists come in. So they start to look at that data, do machine learning algorithms against it to really start to train against the audiences that we think exist at those locations. So we're turning this information into basically the understanding of basic demographics of who might be there, the type of person, not personal data. It's not John Smith. It's just that this person is a business person or it's a mom out shopping and running errands, right? The general nature of who that person is and then their behavior. Are they currently working? Are they currently recreating? Are they with friends? Are they alone? All right, so we see these basic behaviors, but these are the things that allow a brand to connect with their customer effectively on the mobile phone. So how does that connection occur? So basically we'll take that final predictive data set we loaded in through partners onto the ad serving networks. And in real time, as you fire up your phone an application will pass the location and we will provide the prediction based on that location and time as a data set. And then the application can basically decide the decision against that, right? In our most specific case, it's ad tech and we're deciding what ad to serve to a given customer. Make that ad relevant, right? If you're going to take my time and if you're going to consume some of the space on my mobile phone, I'd like that ad to be something that I'm interested in. So Geospatial, we were talking to Jeff Jonas of IBM earlier this week. I was saying that we were at IBM IOD and he made the comment that Geospatial is super food. And so that got our attention. Now, Steve, where are we in terms of, your last statement was right on, it needs to be relevant. Where are we in terms of the relevancy spectrum? Because we all get ads and frankly most of them aren't relevant. So how do we do better? Where? Because everybody's not using PSIQ? Well, okay, so talk about that experience of the PlayStation IQ user. Yeah, I think it's evolving, right? So this ecosystem is developing rapidly but right now I'd say most of the advertisers are focused on just serving the ads up and kind of taking an online model and transitioning it to mobile. Yeah, right. So they've taken that analogy. Just put it out there and get some impressions. Exactly, part of that is I think education of the customer as well. The customer is still trying to understand what the mobile phone is and why it's different and how it's different. How that means I have a different relationship with my customer. So in some sense, we have to be patient because we're in enabling technology, right? We can't make it happen. But we do see that the most forward thinking agencies out there and certainly the leading Fortune 100 brands out there are starting to understand this. The light bulbs coming on, I'd say in the last year what we've seen is their experimentation with the mobile phone and how to reach out to customers. But we feel that in 2013, we're going to see significant growth that people are going to reach out and really do think so. So all your words for friends, users, you're going to be seeing ads that are much more relevant. You're going to spend more time clicking ads than you are playing with your friends. There you go. Put in the ads stuff aside because that's obviously a good use case to show the tech out there. Talk about the cognitive solution that you have. Because that's really the interesting thing to me. I met the folks out in IBM, IOD, and the management team's got a lot of experience in this area. What's their tech that was interesting to you? Yeah, so it really was their focus on an in-memory solution that was key to us because we needed this massive amount of information to provide insights very quickly because we're doing all kinds of hypotheses and what if analysis, right? So the more I can do, the better my analysts are, the smarter they get, right? Traditional BI, but you know, at a larger scale at A. But their partnership with Amazon was really what sold us or differentiated them to us. You know, the fact that they would work with Amazon and do this by the drink, where we don't license cognitio annually. We basically pay as we use it. So that was very attractive to us because again, in this data world- You're an agile startup. You don't have the big pile of money to pay for the big systems. Even the bigger guys, you know, I wonder if their infrastructure is really being used 24 by seven, right? I think a lot of people are going to look at this and understand it's not just websites that have peak loads and variability. You know, your data analytic load is another area where it tends to be variable, right? You know, analysts will come in and work on a problem for a week, but then they go away and probably do some other activities, right? They're not just hammering the system all day long. And your success with the system has been from a reliability standpoint. What's, give me a rock solid 10 being great, one being poor. Yeah, no, it's- I mean, where are they in the spectrum? It's great. You know, I'd say they're, it's a new system and certainly they're, you know, kind of learning how to work with Amazon and spin those clusters up seamlessly, but it's an eight, you know? I mean, it's, you know, I'm a startup guy, so I'm okay with like a little rough edges, but, you know, the leading, bleeding technology, the kind of technologies. Next time I see those guys, I'm going to ask if they're standing in line behind Adrian Crockett from Netflix. Netflix gets cut the line. Especially on the solid state, I heard that through the grapevine, but Netflix is running all their stuff on Amazon. You know that, right? Yeah, so they've been highly successful in debunking the myth that you can't run your business on Amazon. Although I called Amazon the junkyard dog of cloud because you could assemble your own stuff, which actually is attractive to startups. Yeah. But they've had some high profile- EMC and others would say, no, that's not a solution. But they've had, we had to talk about it. They've had some high profile outages. I mean- Yeah. You know, the probability that was a DC power issue. But that's why we do- But Amazon has some other ones. So how, as you as a CTO, how do you architect around that? Right. So you have to be in multiple availability zones as they recommend. And that's really the key issue there, right? Yeah. And that wouldn't have solved the Netflix part. Well, that's why we're talking about Nirvonix. For example, Nirvonix has an awesome cloud solution because they have that failover. So a lot of them big media companies are doing separate cloud storage, but Amazon has been a great success story. So, but I'm wondering about Hadoop. Are you using any Hadoop in the cloud? Yes, absolutely. Playing Amazon? Yes, absolutely. What's that look like? So basically, we will use- Elastic MapReduce for the folks out there don't know EMR yet. Right. So, you know, once our analysts have kind of studied and understood the data, built their algorithm, now we need to scale it, right? So we probably trained it on a couple of billion records. Now we need to turn it on and have it, you know, basically run algorithmically against billions and billions of records every day, right? So that's where EMR comes in. I'm smiling because that's really hard to do. Yes, it is. It's not trivial. Training machine engineer and a technologist, you know, the ability, you know, we got EMR up and running in two days, right? And, you know, on Amazon, I'd say the hardware isn't as efficient as you would get on your own colo or your own hardware, but the ease of really spinning up that cluster and running these large jobs is still- So that's a MapR distro, is that right? So can you talk about that a little bit? What's your experience been with the MapR? It's, you know, it's worked for us. I mean, we really haven't had any issues, right? So, I mean, from day one- You don't think about it. Yeah. There's the only- That's the MapR I would like to hear that. In the data-cosm era we're living in. That's the new buzzword. Right. You know, so we've had, you know, maybe occasional hardware outages in the middle of a large job. That's the biggest thing we've run into, but I think those were more sort of Amazon infrastructure issues versus the software stack. Yeah, okay. All right, and then, so the primary requirement from the standpoint of in-memory analytics was really the business model of pay by the drink. That was the thing that was most attractive. Yes. But the tech's got to be there too in this demanding world of real-time and, you know, of offline analytics. Talk about that a little bit. So, you know, we tend not to be real-time. We're not a real-time system, but, you know, for our world it's near-time, exactly. And, you know, the ability to get this into memory and quickly understand the data is key to us. So, you know, I think the biggest benefit we got out of Cognizio, though, was, as a small company running a large HPC cluster or a cluster of clusters on Amazon, you know, we're getting a super computer that, you know, I would have drooled over 10 years ago as I had that hardware sitting in my, you know, in my own building. Okay, Steve, final question, because we're getting the hook here, the planes are starting to back up on the runways, Mark Hopkins likes to say. So, talk about what you see for the next 12 to 24 months in the industry. You're out there on the front edge, Elastic Map Produce, really strategic to Amazon, Netflix is on there, you're doing some amazing work. Hadoop's emerging, get a bunch of systems out there. What do you see for the next 24 months in terms of the evolution of the technology, obviously in memory and flash has been fantastic, but what are the things that you're watching, trends that you're watching, paying close attention to? So, you know, one of the things is, Hadoop is great at running a single job and running it well, you know, just a large job, single threaded. Data warehouses are about kind of exploring the data and, you know, really running lots of different algorithms and queries against it in a more ad hoc way. I think the two worlds are going to come closer together again, so you had the early phase where Hadoop went out and kind of solved the big data problem for a very specific use case of how you would process data, but you did lose some of the things that a schema and a sequel and relational database and a data warehouse gives you in terms of being able to understand and slice and dice your data. So, I'm seeing a lot of vendors out there start to figure out how to put that together, right? Don't want to name names yet, because I think everybody's really early, but I think that's the problem that we're working on. Well, we're going to talk to Ben Worth of the CO platform who just launched his company, who claims he could slice and dice in real time, managing with Sequel, and so we got Hadoop, we got the showcase award. Cognitio has great technology and great to meet those guys on this trip at IBM event. So, hey, congratulations. Thanks for coming on theCUBE. Congratulations on your funding. Thanks for having me. I'm going to put a quarter in the press. It's 4.2 million on top of a million dollar seed round. And congratulations. I had to go and buy that out. This is theCUBE. We're very friendly. We'll be right back with our next guest, Ben Worther after this short break. We'll be right back. Thank you. Thank you, Steve. All right, great. Thank you.