 Live from New York, extracting the signal from the noise. It's theCUBE, covering Spark Summit East, brought to you by Spark Summit. Now your hosts, Jeff Prick and George Gilbert. Here with theCUBE, along with George Gilbert from Wikibon, we are live in midtown Manhattan at the Spark Summit East event. Excited to be here. We did a quick drive by last year in San Francisco, but got the whole CUBE set here. Been going wall-to-wall for two days, really going out, getting the smartest people we can find, getting you the insight and the information that you can use, and we're really excited with our next guest, Justin Langseth, the founder and CEO of Zoom Data. Welcome, Justin. Thank you, thank you. I think last we did see you in San Francisco under slightly different circumstances with not our fancy sets, and now you get the whole CUBE experience. I think it was like an iPhone, a flashlight, some kind of tin can microphone. Exactly, we can go gorilla when we have to, whatever we take. But first off, congratulations on your funding around you announced the other day, $25 million. Yeah, we announced it yesterday. Yeah, two days ago. Interesting, Goldman led it, even though you still, you have a really awesome group of VCs behind you, but Goldman led it, so that's a little bit different. Yeah, Goldman's a strategic investment group, different from their merchant bank, so the merchant bank is looking for financial return, but their strategic group, which is led by their CIO, Marty Chavez, here in New York City, is really looking for strategic things to invest in that Goldman actually wants to use, or is using in their technical environment. So it's kind of a different kind of take on it. Yeah, absolutely. So what was the zine to that? Are they using some, using you guys now? They obviously see great opportunity and opportunity. Yeah, yeah. Dig into a little bit deeper. How are they using it? How do they want to use it? Where do they really see the business benefit? Yeah, so they have a lot of, obviously, modern stuff going on over there, and they're trying to look for tools that are both developer-friendly, because what's interesting about Goldman is 10,000 of their 30,000 employees have CS degrees, and they're actual coding engineers. 10,000 of their 30,000 people have CS degrees. One third of the company has CS degrees, yep. Wow. So they were in IT. I didn't know they had CS degrees. Yeah, I don't know if they all have CS degrees. They're all engineers, so they can commit code and stuff like that. And they're really looking at data, data visualization, business intelligence, and I think this is a general pattern as well, but it's shifting from kind of like a platform side of tooling with the cognosis and business objects of old, kind of looked like enterprise kind of platforms. And like a lot of IT, it's moving more towards kind of apps and building apps and deploying apps for specific purposes. And that could be internal to internal groups of employees. Maybe some traders need to see an app about something in real time, or it could be to external customers of a company. So both the developer friendliness and a lot of stuff we're trying to do around that is really appealing as well as our ability to really leverage security kind of on an end to end way. So the user's security kind of can be passed all the way through the Zoom data platform down into the underlying data stores. And what the user sees on the screen is reflective of their as of right now security credentials, and if those credentials change, literally the data on the screen could change or disappear. And this is a feature that our government customers also like very much as well. Yeah, and that's really interesting. We talk a lot about perimeter-less security, right? The whole security paradigm has completely changed and really needs to be down at the data level as well as the application layer. And you're saying you can manage that even in a real-time basis, suddenly your clearances get pulled, whoop, there goes your screen. Or your clearances, it gets broad-ended. Or it gets broad-ended even when suddenly you have more piece of pie on your chart. So not making copies of the data is the biggest thing. Having one copy of the data, securing it one way, having it in one place, and bringing all the questions to that data as opposed to taking copies of the data and sending them out to the questions is a huge differentiator. And I think that's why our technology that can be run both on-premises as well as in-clouds and also in a hybrid manner is critically important because you don't have to take all your data and copy it to some other system. Because once you copy it, it could be stolen, it could be disjoint, it could be, you know, different security could apply or all kinds of problems. You could lose some of it, like huge problems occur, right? Let me rewind you for a minute because it sounded like you distilled something but went by a little bit fast. Which is to tell us sort of the ethos of each generation starting with the business objects, Cognos, and then the tableaus and the clicks, and now the zoomed datas. What's the core design philosophy and the user problem you're trying to attack? Yeah, so initially in the mid-90s, I don't know how far off you want to go, the mid-90s, it was a desktop query tool contacting a single database and that was business intelligence. I worked at MicroStrategy in the mid-90s and we built this thing that did that, right? And then the client server wave hit so we introduced a server level. So business intelligence like all other enterprise software now had a desktop client and then a server side and so more and more of the work kind of shifted to that BI server layer and that's where the prior generation of tools, the business objects and the Cognos really have built out a late 90s, early 2000s kind of traditional client server layer where in some cases it's a Windows-based kind of architecture but regardless, it's not their fault. They started 10 or 15 years ago and things like big data didn't exist, things like cloud didn't exist, things like microservices certainly didn't exist. So any technology that hasn't been kind of built from scratch in the last three to five years kind of cannot have benefited from the concepts around cloud and mesos deployment and Hadoop and kind of microservices architectures. So we fortunately, its timing is a lot of stuff in life, right? So we started three and a half years ago after Hadoop and NoSQL and microservices and cloud, all this stuff was out there and so we kind of benefited from that and built a very modern kind of architecture. Can you tell us more about this modern architecture? You alluded a few minutes ago to not bringing all the data over the wire sort of like one of the ethos of Hadoop is put the compute on the data. So elaborate more on sort of the constraints, the environmental constraints you're living in and so how that affected your design. Yeah, so we knew Hadoop was going to be big when it started the company, but we had, enough of the Hadoop had happened so far that we realized all data's not going to go into Hadoop. Some data's going to stay in traditional data stores, some data's going to go into other new things like Elasticsearch or MongoDB or Redshift or it's not all going to go into one Hadoop place. And I think some of our more newer generation competition frankly started a year or two before us when it was all about everything's going into Hadoop. So as long as you can access the data in Hadoop and run MapReduce against it, MapReduce, note, not Spark, right, everything's great. And so we came in after it was clear that Hadoop's important, but legacy's important, cloud is important and Elasticsearch and MongoDB and all of a sudden no SQL stuffs because Hander's important. And so we built an engine that can push queries and push jobs all the way down into any of those data stores including Hadoop, including Spark, including Elasticsearch and MongoDB and then can also bring that all back into Spark which is why we're here at this conference because we depend on Spark in our middle tier and that's where we do any work that can't be pushed down into the database or into the Hadoop or into the Mongo and you want to push as much as possible but maybe the system is a legacy Postgres or SQL server and you can't push work there because it's not performing enough to be responsive to users or you want to join data together between a SQL server and a Mongo and Elasticsearch and you can't push it all that work into any one of them because you'd have to copy data and again we decided just a minute ago copying data is bad so that's why we used Spark to kind of glue all that together. So Spark is your sort of data analysis application server. Spark is our middle tier where we do anything that can't be pushed down or shouldn't be pushed down to a lower level. We prefer to push work down because generally there's a big cluster there with thousands of machines running Hortonworks or Cloudera ready to receive it. However, if you're trying to do work that can't be pushed down because or you want to join stuff together across things that's where we use Spark. And we use Spark also and back to your question about BI middle tiers. Traditional BI middle tiers will generally hold sets of data in them and those could be result sets or they could be cubes of data depending on which technology you're using. Some of the technologies really require you to kind of cube all the data into a middle tier but most of those are proprietary middle tiers that have been built over the last 10 years often using kind of less scalable technologies. And so we use Spark as that middle tier and we run a little embedded Spark inside Zoom Data if you just turn on a Zoom Data Docker container or something but we can also repoint that to an external Spark cluster. So if you have 2,000 nodes of Spark instead of having a limited middle tier BI server suddenly that 2,000 node Spark cluster becomes your uber scalable BI middle tier for anything that can't be pushed down to the no end stores. So okay, that was my next question. What's the difference between your middle tier today that's made of Spark and essentially the middle tier of the client server era and it sounds like that was a scale up tier and now you're on a scale out general purpose data processing tier. So our middle tier is effectively it creates Spark jobs and jockey Spark jobs. So it's trying to do, in Zoom Data Server itself we're serving up the GUI and handling that interaction. We're trying to do almost no actual data crunching because we want to do all that in Spark or even better in the underlying data store to the extent it can do it performantly. So we are looking at anything a user wants to do figuring out how we can address that question very quickly and get them to see either the full result or an estimated result within three seconds because that's a human attention span. We stopped talking for three seconds. Your audience is gone, right? Let's try it. No, okay, let's not. All right. So you gotta show the user something within three seconds. And so that's what we're all about is figuring out how to do that and we do that through Spark jobs. And if you have one user, one minute and a thousand users show up the next minute you can scale it very elastically because you're using Spark or if you point it to a small data set and then you point it to a thousand times bigger data set Spark can scale in that a traditional BI middle tier can not. The Bob J or the Cogniz would not. Even if they can cluster, to some of them can't cluster across multiple machines you still, you'd have to add another physical machine or a virtual machine. You'd have to reboot the whole thing and point it. Whereas we can run on Mezos and Mezos automatically scale up and scale down, right? Just automatically based on load at both ends of it. Okay. So, how about, one of the things probably leave this name on said but of some very insightful comment from a old friend of the, of theCUBE, said, you know, we've gone from this era where we have a huge amount of data in storage like in a file system. We sample that, put it in the database to kind of massage it and clean it and then we sample that to do the, you know, modeling and sort of get rid of the outliers. And then we sample that and it gets it into the visualization tool. Now the model is we have that, those are, and each tier is a different tool. Now what we have is an entire stack that gets embedded into an application workflow where it is part of a context. How do you, how do you play in that world or how do you see that world evolving? I mean, around applications. Yeah, where you don't have to be an end-to-end ERP system, but you can feed, you know, a decision. Yeah, that's, I think that's the application of everything that's going on. Like on your phone, you don't have like one app. You have like different apps that do different things. Right. And in the past, you'd buy SAP and that would do everything, you know? And I think today in our consumer world, we're using apps to do, micro-apps to do lots that you didn't want a car delivered. Uber is trying to be the Uber app that will bring you both a car and food, two things. But like most cases, you're using different apps for different things. I think the world of IT is going the same way. However, especially when dealing with data, that's again, like we said, stored in one place, secured one way, you do need a common platform at some point so that you're not copying data into all these different apps separately. And that's, I think, why some of the SaaS, BI and SaaS software as a service kind of stuff where you do have to copy data if their cloud doesn't work so well. So for apps, it's important to allow the developers the freedom to build the app, to be as simple and beautiful and expressive and have that workflow embedded and have the right data at the right time, but have it leverage a data platform. And that data platform needs to be sitting on top of the enterprise system. It needs to be blessed by IT. It needs to be properly set up so that the Kerberos tickets or SAML credentials or LDAP flows all the way from the users, mobile phone or browser down all the way through the underlying data store and back up. This is really actually, okay, we need you to unpack that carefully. Because you said something, but if we look at it from a slightly different angle, which is very, very profound. Someone else was talking about this earlier today, which was, well, we might not have big end-to-end apps, but each app could be a SaaS vendor and we'll stitch them together because it's easier to stitch them together because there's essentially only one version whereas when it was on-prem, everyone had a different version. There's no way to stitch them all together. It's sort of like the... But then your corporate data all goes into all these different clouds. You can't control it. Well, yeah, except you might have some control because there's perhaps a standard way for each one. It's not easy. It's not like SAP. But you were saying walk us through the challenges of that approach. So again, one copy of the data stored one place, one way with one security model. One copy of the data stored in one place. Secured one way, right? So that way if you need to update that data or you need to ask a question of that data or you need to back... There's one place to go. As soon as you start having the data in multiple places, there's an opportunity for parts of it to get lost, for somebody to change something. Oh, the old normalization problem. For it to be stolen. Just copies, yeah. It's like how many different copies of this number? How many versions of the truth? You want a single version of the truth. The best way to do that is to have a single copy of the data. And Hadoop's a very reasonable place to have that copy of the data. But there's lots of other engines too. And that's where a SaaS software as a service BI starts to... Where it's only SaaS software as a service means you got to slug your data out to DOMO, say. We don't really compete with them. Just an example. You send their data to them. Now they have a copy in their cloud. You have to worry about keeping it synced and keeping it accurate. And then you have to secure it there as well as secure it on premises. And that becomes for highly valuable data a bit of a challenge. So we're running low on time, Jess. So I want to ask you, so where does visualization fit? And we were joking a little bit before about the beautiful pictures that you often see on demos of these multi-colored, multi-line, kind of crazy things that make a pretty picture. But I don't really see anything out of it. Well, what role does visualization do in exposing insight to people? Because right at the end of the day, that's what we're all trying to get to. And I think there's pretty common agreement that just having Ivory Tower, you know, super smart people running traditional VI tools didn't necessarily get the right data down to the people making decisions. So how does visualization fit in? What do you guys do different to really make that click? Well, the best visualization is the simplest. It's like Occam's razor for visualization, right? You could show all data points in this big thing, but maybe a user just wants to know what the growth has been from last year to this year. And hey, a bar chart or a line chart or just a single number of percentage. The best visualization might just be a number, one number of percentage sign with a color. If that tells you what you need to know, that's the best visualization, right? So, you know, we keep it simple. We say if a bar chart can do it, awesome. If a line chart can do it, awesome. We don't assume you need some like super crazy thing. And I think the crazier visuals that you see examples of where you look at it. I think what the problem is, is they're trying to be generalized, crazy views of lots of data. But I think what you need in that situation is a custom view of data, but one that really shows you the process. So think of a supply chain. Like wouldn't it be great to have visualization that like shows the farms growing the corn here, shows the trucks taking the corn from the farm to the oil factory here. And then another, so as you can see that whole supply chain in real time, you can see the performance of it at a high level. And then the word zoom data originally came from the fact that you could zoom in deeper and deeper. I want to zoom into that oil factory. I want to see what the operational issues are there. I want to see then zoom into the farm. And I want to zoom into the row of plants. With IoT and stuff, you can actually do this. So I think the fancier visualizations are really good when a human can look at it and go, I get it, that's my supply chain. Or I get it, that's my company. Not when you look, I don't know what that is. I think that's less useful. And so I think the fancier the visualization, the more specific it has to be, at least to a vertical or to an industry or even to a specific company. And we at Zoom Data allow custom visuals to be plugged in. However, we find the best visualization is the simplest visualization. As long as it's rendered really fast, because again, three second rule, we don't talk for three seconds, people stop looking. I like it, I like it. We're too fast and simple. We're too personal and a question, right? Yeah. We're too a question. I just need it for the question. I don't need the whole thing. Fast and simple is the way to go. And that's what we're really espousing. Well, Justin, thanks for stopping by. We appreciate you coming by. Glad you got to see all the lights and the cameras. Yeah, now I know you guys are legit. You're not just a guy with an iPhone as much as you are. This is what I'm talking about. It was a little more than that, but not much. And again, congrats on the funding. Yeah, thanks a lot. Thank you for the opportunity to do in today's market, I think. Yeah, we're very excited, very happy. Very happy to have both of you on Comcast as investors. So Jeff Rick with George Gilbert here in Midtown Manhattan at the Hilton, we're at Spark Summit East 2016. We'll be back with our next guest after this short break. Thanks for watching.