 Magulis, Roger Magulis. Come on in, Roger. John's going to join us, I believe. And hi, I'm Dave Vellante. Good to meet you. Thanks for coming on. No problem. Congratulations on this great event. Thanks for having us here. Look at this, do a reset. Yeah, so as I say, we're live here from Santa Clara, siliconangle.com, wikibon.org. You know, check out those websites. There's a lot of great information on big data. We've been covering it like crazy. We've had guests going all day today. We'll be going tomorrow and Thursday live. And so anyway, let's get into it. So, Roger, you do all the research, or some of the research for O'Reilly. You're a quant jock, as they say, our quant guy. But you do some qualitative research. So for the folks out there, the siliconangle, wikibon audience, tell us. Tell them what you do and some of the things that you've been finding. And tell us about your talk earlier today. Sure. So my group consists of myself and another person who is a strong quant, Ph.D. in partial differential equations, taught stats as a good background. So I'm more in the data side, and so we have a mix of knowing data, knowing quantitative stuff. And our main function is to try to figure out technology trends so that we know what to publish, what kind of conferences to have. The germ of this conference came out of research we were doing, and plus it was our own space. Easy to play. We should do a data conference. That was the fastest growing trend in the world, most important, too. Now, also, we're encouraging this about three or four years ago. So a little, hopefully, ahead of the curve there. So it's not just Tim O'Reilly, the only trendspot. He's got a little help, right? He's got some L's on the back row. You're one of the L's. He's good. We give him props. And Tim is amazingly quantitative for a classics major. He's a really good quantitative. And we've done a lot of work together where it's like we toss things back and forth to figure out what's going on. So we actually use a lot of the things that are going to be talked about here during the thing. I do machine learning. We do a lot of stats and stuff like that to figure stuff out. We also do a lot of talking to people. And we call this kind of our alpha geek thing. This is something Tim has done a lot of, is that we try to find interesting people. He has this alpha tech VC, which is, you know, alpha. And he's always been targeting those alpha geeks. Yeah. And so a lot of times we'll hear them and then we'll say, oh, this is something new that we hadn't heard about. And then we'll start doing some quantitative stuff to see is there something there. And I'll just use an example. It's not in the data space, but node.js, the asynchronous. We covered the node summit live. Okay. That's January. Yeah, we were there. Covered like a blank. JS DevOps. It's all happening. That's right. So we started here and probably a year and a half ago, a few people we really respected were jumping into that. So we're saying, okay, we got to start checking it out. Take the time. Yeah. And we did find that it was getting some traction and that it wasn't just alpha geeks. It was really sometimes going around and there was a good reason for it. So we could also explain it in that, yes, having an asynchronous server means you have lots of simultaneous connections. And there was something else going on that we cared about at O'Reilly, which is JavaScript was becoming a server-side language. And that meant it was becoming an important language. As anyone who codes knows, a lot of times when you're faced with a problem, you use the programming language that you're fastest with. And more and more people are fastest with JavaScript. So we think we're going to see more server-side JavaScript just because people need to get stuff done and want to do it. I'm a Python guy. I don't know that I'd pick JavaScript, but there's going to be plenty of people who will be in there. It opens up the developer market from, I don't want to say kind of the kitty script, but it's a fast language and it's not necessarily the PhDs and the guys doing stuff with HBase. It gives the guys some headroom. And we were really impressed with the Node community because a very solid community that's kind of just core developing fast. So good call there, really, really good call. The other thing that's with that community too, and this is I think where the AlphaGeek stuff comes in, is they had a philosophy around how they were going to build Node. And I think that that was good. And they always wanted to keep it simple. They weren't going to let more in than they should. They weren't going to turn it into an MVC. They'll let someone else do that. And they knew what problem they were solving. The new IO performance is a huge issue. Tied that in with Flash, Fusion IO's of the world. You have a great, great environment for some cool stuff. The LinkedIn app is all known. That's right. And that's how, in a way, we work. So we heard this was coming up and then we were, we know enough people to say, to come to those kind of conclusions around the technology itself. And then we start doing our analysis to look at what's going on. Roger, in previous lives you've done some enterprise data warehousing type of projects, more traditional types of things than what we're talking about here. I've been asking a number of the data warehousing practitioners. Is enterprise data warehousing a do-over? I'm going to ask you the same question. No, so I don't think it is. And I think here's the big difference. So when I started doing data warehousing, I really liked it because I have a computer science and business background. It was a great combination of the two things. But we were taking operational systems and trying to figure out what was going on. And what's changed and what makes this not a do-over is that I spend more time pulling Twitter streams and going through, I've got this big job post database and blogs and unstructured data. So I'm like going out into the world. I'm not just pulling numbers from an operational thing. I do as much natural language stuff and classifiers as statistics. Because I'm trying to make sense of something that's a lot more amorphous. And so that's why I don't think it is a do-over. I think that that set the stage for things like, there's a star scheme of help and stuff like that. And ETL is a skill. You've got to bring data together. And how do you correlate things? But I have started with this big job database. I'm trying to see if R is getting more popular. How do you pull out a single letter out of a bunch of tests? Yeah. It's not an easy task to do. So that's not the kind of task I was doing in the enterprise data space there. In a way, it was easier. Because there's a transaction. There's an invoice. There's a number. There's a salesperson. It all is kind of there. But this stuff is so much more amorphous. A lot more creativity has to go into it. A lot more art, in a way, has to go into it. So this is one of the reasons that we had a big data report in 2008. And while we talked about big data volumes, we also were talking about this new set of skills that now people are using data science to talk about. And you didn't hear people talking about that in the enterprise data warehouse space. In fact, of anything, I think what can happen sometimes in the enterprise is that they're so focused on their operational stuff that they're not paying attention to this bigger world and what else they maybe should be pulling in from outside. And hopefully that'll be one of the lessons that comes out of the conference. Yeah, so you hear a lot in IT about doing more with less. You're talking about doing more with more. Yeah. And one of the things we think about is this whole smaller team thing, is that keeping the data and the analytics close together so that you can have fast response and that people are doing more things. Now, my groups were always very integrated at the companies I was at when I was doing consulting. But sometimes you see big companies where it's very siloed. And I think there's a lot of hold up. So we like to think that by bringing things together you stop that. And the more you iterate, the more you learn and I think the better you end up getting at the art part of the analysis. So my question is on the data problem and turning that around a little bit. Obviously with cloud computing, we have Moore's Law for years. We all know that. We have unlimited compute essentially. And with conversion networking, that's not a problem. You're talking about quantum computing is out there being discussed. It's always 30 years out. It's always 3D modeling. It's not binary anymore. Okay, I love that. I could smoke the peace pipe all day long and talk about that. But the reality is that do you think data is the bottleneck right now because of the closed open data debate? Because in order for data to flourish, you need open data. Yet data warehousing is more of a mainframe type approach, corndoff data. Is there enough data to satisfy all the compute that's out there and as data becomes more developer centric, fundamental in the developer equation, is it a bottleneck? Where is it bottlenecking? What's the challenges? Can you help us? So I think the bottleneck is on the skills, the resources side. The human part of the equation. So the human resource. I think there's plenty of compute power. There's tools that make things a lot easier to do. Sure, there's problems that are hard to scale, like doing real-time Facebook analytics. It's just going to be tough, right? They've got a lot of... Trilling is a transaction. That's right. Stuff like that is going to continue to be a tough thing to do and they're going to have to take shortcuts to do that. But generally, I think the problem is that you have even one data set, you could spend maybe years trying to get to the bottom of what it's telling you, particularly around human behavior, which is the human brain is the most complex structure we know of in the universe. And there's a lot to get to it. So I think you can... If there's a problem is that we stop too soon because of expediency, that there's a lot you can drill into. And we have an expression, the best analysis asks more questions than it answers. So we want to keep drilling in. Is there something there? Because rarely do you get all the answers right for pulling in. So what's nice is that the kind of democratizing of the technology that more people can do more work, things like Hadoop or even Mongo where you can ramp up a quick sandbox to do stuff, means that you can start drilling into these things. But the people who can drill in who can make some sense of things, who can defend their work, who can be objective and so forth, I think that might be the... That's a fair answer. I think there's enough data sets out there. I think there'll be more data sets. That's kind of my opinion. But I think I do agree with you. There is a human element that needs to be explored. The second question I want to ask you is, what are you finding right now? Honestly, you're here to show... I know how O'Reilly works. First of all, we love O'Reilly. He's a great, great, great organization. Thank you. High quality content. But you guys are worker bees. You know the big events coming. You've got to have your stuff. You've got to have your reports. You've got answers to those questions. What are you finding? Share with us your knowledge around what you're sharing at the show today. One thing is just applied machine learning is everywhere now. Not that everyone's doing it, but a lot of people are doing it. When we compare this strata to the last strata, it's almost like... I don't know if this metaphor works. It's like atoms and molecules. Last year, there was just a lot of talk about how to do one thing, how to do another thing. And now it's like how to do the whole thing. A lot more... More comprehensive. And it seems to be a lot around machine learning. And I know when you have unstructured data and you have a lot of it, there's no other way to make sense of it. You can't manually look through it. So I would say that would be thing one. Thing two is how visualization is being done. I think things like D3 and ProtoViz, JavaScript Canvas is now a worthwhile platform for delivering charts. Now the problem sometimes with that is a chart without narrative can be a pretty dead thing. And so being able to tell a story, and so this isn't a finding, this is something that we like to preach, is that storytelling is as important to this as the data is. Because... We agree, it's why we have the cube in our blog. Yeah. And I think that without narrative, we're designed as humans to obviously deal with... We've got to teach the machines as well. There's a meta-reasoning that has to go on. Are you seeing any progress in that front-end metadata management, the meta-reasoning around pre-machine execution? I mean, how some machines can do anything you tell them to do, right? What are you learning on the front-end? You go back to AI and all these approaches. You got to teach the machine, so if you fail to teach the machine, you have a failed approach. Right. So can you share with any findings on that? Yeah, so this isn't so much finding as this is the AlphaGeek part. So what we're hearing is that people are starting to make the exploration of the discovery part a little easier and a little more... What do you mean by that? So if I get a hunk of data, like my favorite question on being sarcastic is, how some can you get me something? It's like, I haven't even looked at it. Well, what do I start with? Well, I start with frequency distributions. I'm trying to get a sense of what the data is. And then I'm going to go from there and try to figure out what I'm going to do. But what if there was something that kind of did a lot of those early steps for you? And what if, as they did it, there was some actual machine learning, some rule base that said, you have this kind of data. Why don't you think about this approach and then go further? So it's kind of like almost like a data analytics platform so that the quick and repeatable parts of discovery become more productive. You can become more productive doing those things. And actually, more people know about it because I know the kind of things I do and what I've learned from experience. I don't know that those are universal things. We have our own little social media labs. We're into the whole algorithm geek stuff too. And two years ago, algorithms were the rage. Algorithms, but we're finding rules. Rules-based approach is different. It's actually more applicable to this, is it an apple or a pear? What do you got? Yeah, so just when we solve the R problem, it's a rule-based to machine learning. And so rule bases help make the domain small enough that machine learning works more effectively. And that without that, you just have to have more accurate results. So I'll get a little geeky. I tend to use Naive Bayes because we've got a parallel data structure, data architecture. And Naive Bayes has independent assumptions so you can paralyze pretty quickly. And we do stuff, we get like 96, 98% accuracy when we measure what we're doing. And we think it's because we rule-based things first. We're creating these much smaller domains with a lot less noise. The statistics in Naive Bayes work better and you get it out. So we also think that more data is increasingly, it's not true in every case, but increasingly more data trumps algorithms. So you can have relatively simple algorithms and then throw more data at it. And then you get... If you validate the front end, this is where the rules are interesting because you have humans, this is the linguistic problem with ontologies. If you know any out on that. Yeah, I'll make two comments on it. One is we use an expression all the time in a rally where people are the last mile. And I think that's still true here. Is that with data, you're just not going to have a black box with this stuff and ontologies. They're really important. They're the base of the rule-based and they are a pain in the ass. It's hard to actually get the seeds going because that's the cool piece of it. I mean, I could throw machines at anything. Well, for the longest time, people were the first thousand miles. You think, okay, well, Mechanical Turk is going to solve this problem? No, I don't think so. He doesn't know this, but some of my history is ontology. I was doing a ton of ontology work in my CSI after operating systems. And it failed miserably. I was so frustrated with it because it was an academic AI-like thing. It wasn't AI. The AI guys hated it. Ontologies was failing. It was corner cases of success, but it never really got anywhere. Yeah, no, they're hard. Like book publishing, we want to organize the content around stuff. Well, it's really dynamic. And then something new happens where it breaks all the rules that you set up. And then you got to decide, how do I re-branch everything? Well, I was talking with David Floyd. He met briefly before he came into the QPs at Wikibon. It's one of the chief analysts at Wikibon. We've been following Fusion IO and we love those guys. The whole Flash movie there. You can have four databases, master sleigh with the IO card. The performance is unbelievable. So I think with things like that, that's good for us. That's good for you guys, right? So are you seeing that performance we're seeing in every year at Shreddle? Last year was, you know, Bio was the big thing, you know? So what new kind of verticals are going, aha, with the data? Is it all verticals? Is there one in particular? Yeah, performance levels. Let me just first suggest something about the Flash stuff. And that's that. A lot of the Alpha Geeks we're seeing are putting their analytic things in memory. In either Flash or RAM. Yeah, yeah. And sometimes they're using, and this is Redis, it's becoming increasingly something that people are using because it's a... Because the best IO is no IO. That's right. That's my colleague David Floyer taught me years ago. That's right. David, nice work. By the way, David Floyer has a great report out called IO Centric Infrastructure, wikibon.org. Go read it. It's awesome. It's the best work on the industry in that regard. Go ahead. So we are seeing a lot of that. And as far as verticals, you know, it's not that there's... I mean, the obvious one's advertising. Insurance, increasingly. Obviously finance, which has always been really into it. But more what we see are these companies in a space saying I'm going to use this for my advantage. And so it's more like the whack-a-mole kind of thing where people are popping up and like, yeah, I'm going to try to do it. I think that I can get it. And what we see going on with that is there can be some naivete about trying to buy that kind of capability. Is that without cultural change, without the culture of understanding things like probability, problems with enumeration, problems with paying attention, that it's hard to just buy it. You probably have to make some... Fundamental change. I was just talking to someone from a journalist about the journals and like, listen, if they don't fundamentally understand technology has to be embedded from the ground up, then they're dead. But I will say one of the areas and one of the verticals that we're seeing, and it might be a natural reaction of the forces, is newspapers. So, you know, The Guardian, The New York Times, The SiliconANGLE, Forbes, you guys. There's a lot of... We have a huge H-base back-end on all the data we're collecting. So we love it. I mean, it gives us great visibility. Yeah. I think that starting to become data is becoming like... A lot of advantage. Yeah. Absolutely. I won't say data-driven enterprise. I hate that term. So I have one question because this is more of a cultural question because I've followed O'Reilly and Tim's career for years and I've always been impressed with the... Not only the ability to see around the corner with the trends and the team of people and the high-quality personnel you have, but you guys do some stuff that I would call... I won't say pro-bono like stuff, but you guys do stuff that's the betterment of society, right? Gov 2.0. Tim took a leadership role there. I'm sure you've got some stuff going on in research. So I want to ask a question and this is more of the philosophical question. How is big data changing society? And what do you see there? And what do you see as a futuristic? I mean, this is more of a gray area, but shoot to arrow forward. As big data becomes... You know, years out now, what are some of the things we might see as benefits to society in the world around what big data could do? You know, to answer one about this, O'Reilly is a private company and Tim as a person, you're absolutely right. We will do things to better things at our costs. I mean, it's really... I mean, the people stay there end up liking that part of it. I know I like that part of it. And so how is big data changing? I think what's going to happen is Tim uses the expression of the database of expectations. So as we go through things, we start learning to expect things. There's a really funny video of a magazine with a little girl touching it, expecting it to work like an iPad. Like that's an example of the database of expectations. Once you've had an iPad or a phone, you're waiting for that stuff to work. You go to the airport. Your phone knows what flight you're on. It knows what gate you're supposed to go to. It should have a map that brings you to that gate. That's going to be the new set of expectations. Your car is going to drive itself. It's going to take you to the place you go to. So what's going to happen is things that are in your life where data can help in some way get you, you're still the last mile. You're still telling the car where you're going to go. By the way, the word real time comes into handy. Because if you're not real time, you can't get on your flight. And part of real time, though, is what is real time? So I think the kind of classic is that, well, it's instant, but it's not. There's a lot of things where 10 minutes is real time enough. And before you lose the customer, before you miss the flight, that's real time. That's right. Everything is contextual to what you're doing. And hopefully we'll see some better understanding of things. Because as people make sense of the messier parts of the world, I think that there can be some explanations that can come out. We already see things like the 538 blog on The New York Times talking about the elections and the meta-analysis of polls. Now, has that changed elections? Probably not. But I think for people who are reading that, they're like a little more up-to-date. It's provocative. It's like, oh, wow, there's no data. I actually can get society instrumented. That's right. And there might be a little more trust, because in a way it's becoming more transparent what's going on. Because he explains his methods. Now, for some people that might be too, it's kind of geeky stuff, and it might be a little deep. But, you know, chances are we'll understand more. Now, there's both a good and a bad to that. You know, I mean, things like evolution or global warming. I don't want to get into it. Those are clear things with some science behind them. Well, we asked Bill, we are asking all of our guests this question. So if you're watching and you're a guest, and you're going to get asked this question, so have a good answer. Bill Schmarzo from EMC, who I've known for years, just recently joined EMC. He said an interesting thing, and he said a lot of great things, but he brought up this notion of there might be evil at first, but there could be good. So talk about the possibility that with transparency, this arbitrage is evil. There's always an underbelly in every, as Houston bosses, we can say combat zone of every trend where there's uncontrolled and lack of policy oversight. I mean, these are just social issues. Absolutely. You know, we're humans. Yeah. And so we tend to act in sometimes irrational ways and stuff like that. I'll use an example. The DOD had this contest with red balloons, and you were supposed to track where the red balloons were going, and the prize was pretty substantial. And most of the traffic of the teams trying to win the contest was trying to fake out the other teams. So here's a data or anything that the traffic was misleading people. Reports of red balloons in Cleveland, you know, kind of stuff. And that's, you know, so you're going to see some of that going on. You know, health. The big year. Yeah. So I think health is a good place to kind of talk about that. There's a lot going on, right? HIPAA was done before the explosion in data. And that's a big problem now. It's a big problem now. And there's, you know, there's things around some disease, things you might not mind sharing and other things you might really mind sharing. And how do we negotiate around that? So I think that we've got to work it out. But on the other hand, I mean, from everything I know, data can help with health. Again, this, again, exclamation point to your earlier comment about people. Not just the alpha geeks on the human side. Society geeks understanding sociology of policy around personal data. That's right. Brokering the data. Using the data. Like a credit card. Take my data. I want to help myself or genome or whatever. That's right. And privacy is, you know, how privacy has changed as culture has changed, right? If you live in a little village, right? Everyone knew what you were doing. Yeah, no privacy, right? Cities made things, you know, maybe a little more, you could be private and then there were assumptions built in and now they're kind of going away because of all this. Okay, so my final question is more of a personal question. Looking at the landscape of the business with all your history and background and the industry and society, here at Strata, looking out at the landscape of tech, the tech innovations, what gets you excited? What do you look at and say, I love that? These are a couple of things that I'm actually intoxicated by. These concepts. There's so many. The machine learning, the machine learning is, I'm really excited. I really like learning more about it. I don't have as math a background as I wish I had and as I'm seeing what it can do and how it can handle that. Two is this kind of the automating of data exploration. Means that I can get in and get started on the real work quicker into it. And three, what I'm excited about is just that it's exploding. That data is now becoming something on the front page of major newspapers. Investors are talking about it. It's like, finally there's this recognition of it's important. I totally agree with all those points. But you know what, I do agree. I'm excited that people are realizing that data is at the forefront of innovation and society benefits. I wrote a post two years ago now where I went out on a limb and said data is the next development kit. Where data is the core development asset. Not code. Code's free. So code's great. So if we can get data out there with humans from JavaScript down to HBase. That's great. More than a year. Okay, Richard, thank you very much for coming on. Great insight. Again, that's theCUBE. That's what we do, Dave. Roger Magoulas, thank you very much. Follow him. Roger M. Roger M. Roger M. I don't tweet enough. But I will say Twitter, what a great id. What a cultural id to be able to follow. This kind of when we... I had Mark Smith on from Twitter. He tweeted this morning on the Stratoconference hashtag. This guy had that NodeXL stuff I'm like geeked out on. I was playing with the data this morning before my kids got up, partying with the data, paying them to say, hey, you got to get it on theCUBE. Had him on theCUBE. He's on the front page of SiliconANGLE.com. He's got a nonprofit in Belmont. This guy's hot stuff. I love him. I'm going to help him out. We're going to work together. The collaboration, the connections are fantastic. And those that tweet the most often aren't the most influential, believe it or not. So it's okay if they don't tweet that much. Thank you very much. Okay.