 at Big Data SV 2014 is brought to you by headline sponsors, WAN Disco. We make Hadoop invincible and Actian, accelerating Big Data 2.0. Hey, welcome back. We're here live in Silicon Valley for Big Data SV. This is Silicon Angle and Wikibon's theCUBE. Our program, where we go out to the events, extract the signal from the noise. I'm John Furrier, the founder of Silicon Angle. I'm John McCose, Jeff Kelly from wikibon.org. We have Lenny Blucher, VP of Engineering at Chiwire. Welcome to theCUBE. Thank you. So, VP of Engineering, you must be geeking out on all the Big Data stuff going on here. You know, it's half geeks, half business infiltrating the show, and Dave Vellante and I, Jeff always talk about the evolution of the tech. You know, technology, the algorithms, the platforms, into the business side. So you see in Big Data has attracted all kinds of folks here. So I gotta get your perspective. What's your take of the show here around the tech, the engineering, for someone who's building products, whether it's a startup or for businesses. What are they paying attention to? What are some of the conversations you're hearing? Well, one of the things that I think is different this year is the fact that almost everybody is a data scientist or data wrangler. It's a little different from last time where Strata was focused a lot on scale. Like, how do you handle larger volumes? How do you handle Big Data? How do you make this data available for enabling business decisions? This year is pretty much all about generating insights, generating business intelligence, which is exactly what I'm here for. Because ultimately, JaiWire is in this business. What is the JaiWire business? Tell me a little bit about the business. So JaiWire is a 10 year old company. We're focusing on providing business intelligence to brands and agencies and basically giving them ways to reach the right audience and measure the results. We started out as sort of ad network over Wi-Fi, which gave us a very interesting ability to kind of understand that location plays a really big role in targeting, which became particularly relevant when the industry started shifting from laptops more into mobile devices, smartphones, and tablets. Once we started seeing that shift, we really invested in figuring out how location can be used to create the right audience. About two years ago, we created a product called Location Graph. It's basically similar to how Facebook has a social graph where they figure out who you are by people you're connected to and things you like. In our world, we figure out who people are or create custom audience segments by the history of their location. It's very different from geo-fencing, which was kind of early location targeting technology. Location Graph allows you to really understand and looking to big data, which is the history of locations, billions and billions of location tags, and figure out what the patterns are and once you cross-reference it with other data points and third party data, it really explodes as far as what kind of profiles you can build and what kind of audience you can create for the brands. To Lenny, I'm sure when people talk about the Internet of Things when the early folks who were just learning about that trend, that's not new to you. You connected devices and information that you have to use through that machine data is really valuable. So how do you, when you talk to people about the Internet of Things and the big data opportunity around the analysts, what have you learned from your business and how to handle these connected devices? The interesting thing, the biggest learning is actually very trivial, is that data is not, doesn't equal information. You, in the world of big data, which is trivialized term these days, you do have your three Vs, you know, velocity, variety and volume, and the challenge is really how do you infer information, actionable information, monetizable information. And in our case, it really becomes also a question of business intelligence that we convey to our clients, help them understand what their audience sees, how to reach it, how to measure and the effectiveness of the campaign. So talk a little bit about some of the, you know, under the hood, you know, you've been around for 10 years and you've seen this kind of evolution of big data as we, as it's called, but you know, you've seen this from the start, really, being in this business. How is that kind of, how is your approach to technology to support the type of analytics and the types of insights you're trying to deliver to your customers? How has that evolved? How have you kind of evolved your own internal operations to kind of really take advantage of this shift, as you mentioned, from laptops to mobile, the explosion of data, now you've got unstructured data, you've got all sorts of new types of data sources taking into consideration. Walk us through a little bit some of the evolution of, you know, your internal workings and some of the challenges you guys faced as the data volumes and variety has grown. Sure, so the first challenge we had was actual ability to handle the volume that has come at us. As we integrate with various supply sources such as ad exchanges or RTB platforms, you gotta be able to respond to ad requests within, you know, 30 to 50 milliseconds. Otherwise, you'll basically lose business. So, first thing we have to do is invest in our platform just to be able to handle the traffic that we wanna be able to support. Second thing for us was to understand how to deal with all the data we collect. So you get this fire hose of ad requests and signals that we wanna track. And for us, the challenge was really be able to figure out insights from this data quickly. So, you know, we started with Kadoop as your typical ETL process to cleanse the data, reshape it in the shape that you wanted. Early on, we kind of realized that technologies like Hive are pretty slow for, at a time, a couple of years ago, at least. We're pretty slow and efficient when it comes to the question of enabling your business users to draw the data conclusion or for targeting. So we started experimenting with a lot of technologies. We kind of got into a columnar database which really allowed us to just handle the data on the back end. However, the first vendor we selected had a lot of problems as far as operational support, database, uptime, crashing, multi-tenancy, things like that. Eventually, we went with Action and Paracell which kind of solved a lot of issues for us as far as just stable operations, ability to handle volume. And we were able actually to develop quite a few big data products that we monetized and turned into products that resonated with the market. Yeah, can you share a couple of examples of some of those products and how those are delivering value to your clients? Sure. So the first thing was location graph, like I mentioned. We patented that technology. In essence, what it is is analyzing huge amount of ad requests, billions of billions ad requests and data points, correlating it with quite a few different primers, dozen and dozens of primers, like locations of businesses and proximity to businesses, times of day and quite a few other things, maybe census data and a few other data partners that we integrated with. So we're able to create dozens and dozens of custom audience segments that we can then go to brands and agencies and help them reach this audience. But it doesn't stop there. We realized early on that in addition to targeting, we also have to come up with or invest into measurement. And measurement really means pre-campaign ability to sort of understand what audience at what scale you can reach and help your client structure the campaign in the right way. And then, during the campaign execution, ability to look at dynamics of how your campaign is executing, whether you're reaching the right people, ability to turn on and off different ops. And then when campaign is finished to really showcase the client, how did it do? In this area, we actually finished last year with another revolutionary product called Location Conversion Index, which is really the first product in a mobile advertising that allows agencies and brands to measure the effectiveness of their campaign, ROI, basically. In the simple terms, the idea is is they spend a certain amount of money running this mobile campaign. How much traffic did it really generate into their store? We're able to create this using our new data analytics platform and really defend the results one-to-one matching to specific device ideas at scale. So that's kind of how we were able to evolve in the last couple of years. Well, so technology you mentioned earlier was Hadoop. And of course, here at Strata, that got a lot of attention over the years here at the show. But we're increasingly moving to real-time. And we're seeing in the Hadoop market, we're seeing different players trying to add some of those real-time capabilities, but then there's, of course, other technologies that you can layer on top and around Hadoop. Talk a little bit about how you look at the evolution of these big data technologies and how you approach integrating both and the role of both the kind of the batch historical analytics that you might do in something like Hadoop. With the real-time, where you've got to deliver an offer or you've got to deliver an ad to somebody, and as Niki said, in milliseconds or less. So how do you kind of look at those two? It sounds like they're certainly complimentary, but imagine you need a integrated architecture that actually allows you to connect those dots. So we've done a few things to address this problem, but I feel like we're still at the tip of the iceberg. And certainly based on what I've seen in Strata Conference this year, this is the new topic of the day. How do you basically eliminate ETL and deal with your data where it is, which is Hadoop. And that includes both be able to tap into real-time results and draw insights right there so you don't spend time moving terabytes of data across all your different systems. So we're gonna start evolving into this new way of doing things and that's basically a big part of our roadmap. But in our case, we also have to do with real-time targeting. Like I mentioned, we have to make a lot of targeting decisions within 50 milliseconds and that means ability to learn to basically draw a lot of conclusions out of real data that we see coming with every ad request and different signals that we get. So we already build a lot of sort of in-memory solutions and a lot of interesting platform optimizations where we can basically deploy a lot of custom audience segments in near real-time. Real-time is always tough, but near real-time is kind of where you wanna focus. Talk about the areas of high-performance analytics and what are some of the things that you guys focus in on and should deliver that? And what should some of your peers and other folks that are looking at generating high-performance analytics be thinking about in terms of deploying and engineering those solutions? So an interesting trend that has emerged over the last couple years and it's trend is basically as old as the term data scientists, which is about two, three years old, is that when it comes to working with big data, you basically have two kind of conflicting, not necessarily completing, but competing aspects. First of all, you wanna explore the data. Second of all, you wanna report on it. So this two competing aspects have different requirements and different SLA. With the proliferation of data science and data wranglers, they wanna be able to do a lot of ad hoc data analysis, sometimes in a very large data set which can literally take down your system. So you gotta be able to figure out how you can expose your big data to these two different aspects or use cases. Fortunately, it's something that we're able to accomplish with Parcel platform which allowed us and the site license that we get allowed us to deploy the virtual unlimited number of servers for both sort of data exploration cluster and the tools and analytics cluster. Talk about that technology a little bit. Park Cell is the technology that is kind of the redshift service that AWS offers doing a lot of attention at AWS. Park Cell now part of Actian. Talk a little bit about some of the technology and characteristics of Park Cell that make it, they give it that performance capabilities that kind of, by all accounts, one of the top of the industry. No, so interesting you mentioned the redshift because that's how we started basically with Parcel because it was easier for us to get our hands dirty and sort of understand the value proposition of this particular platform. It was literally up and running within weeks with business reports interfacing the business users. Then we did a proof of concept with Parcel and a couple things that kind of were apparent to us. First of all, ability to ingest data into it, load data was orders of magnitude easier and simpler than what we had to do as before. And that's solved for us for a lot of problems right there. Then Parcel, and this is what different from redshift, Parcel allows you to create a UGFs, user defined functions. And in our case, where we have to do a lot of distance calculations and proximity calculations, this is highly expensive compute operation. With Parcel, we're able to basically make it functional in our size of data. And then it's really robust. It just, the system stays up and we can really enable multi-use, multi-tenant type of use cases. Lenny, I wanna ask you, we got time check here. We're on our next guest here, but I wanna ask you about acting and you guys, sir, you use those guys as a customer of yours. You're a customer of theirs, I should say. We had one in the queue. They had this really nice platform they put together. What should folks know about acting out there that in your experience with them and the success that you had with them? A couple of things. I think with the acquisition of Parcel, we're a little curious how that's gonna work out, whether action is gonna continue involving, we're investing in this platform. And based on what I've seen, they turn out to be a really good partner for us. From support addressing issues as they come to product trials and sort of working with us to share their roadmap with our needs and help us plan ahead how we wanna evolve our data systems, it's working. So they're a good partner. We've been impressed with the folks we've talked to and we've talked to the board members and some executives there. I'll tell you, they put in their all the right pieces together, Jeff, and I'll tell you, in this market, it's all about the business value and that's the theme here. I agree, scale, I mean, Hadoop's here to stay. Now it's all about the data science, the data wrangling, as Jeff Hammerback was saying, the data, the gym rat for data. It's like people who like to work out with data, that's the data geeks and that's where the innovation is and I think ultimately simplicity will be the end game here. So a lot of great stuff happening here at Big Data SV. This is theCUBE, our flagship program and go out to the Advanced Extracted Student from the Noise Stratoconference ending today. theCUBE will be going 24-7 from this point forward. Our last day here on site in Silicon Valley, our office in Palo Alto and in Massachusetts will continue the coverage. We'll be right back with our next guest. We have an entrepreneur and we have more content for the rest of the day. We're going wall to wall and thanks so much for watching. Stay with us.