 Live from Midtown Manhattan, the Cube's live coverage of Big Data NYC, a Silicon Angle Wikibon production. Made possible by Hortonworks, we do Hadoop, and when this go, Hadoop made invincible. And now your co-hosts, John Furrier and Dave Vellante. Okay, we're back live in New York City for Big Data NYC. We're covering all the action in the Big Apple, Big Day of the Big Apple, Hadoop World, Stratocon, it's all happening here this week. I'm John Furrier with Dave Vellante. This is the Cube. Our next guest is Hilary Mason, Data Scientist in Residence for Excel Partners, venture capital firm, former Data Scientist at Bitly. Welcome back to the Cube again. Cube alumni, four years ago, it's been four years. You graduated from a startup to, working as a Data Scientist for a VC firm. To startup maker. We've all come a long way, I think. We sure have, yeah, we have two millions of interviews later for us, and you're doing some great work. So we had one of our best interviews early on at Strata with you, you were talking about it. Just back when Bitly was still like it's just a URL shortener in people's minds, but we were kind of unpacking essentially the data greatness and the awesomeness underneath it, and we were riffing on that. So I want to ask you, what's happened since then? And in the data business, you've seen, obviously Facebook, you've seen Twitter going public, LinkedIn's completely turned how they do things with Big Data over the past two years, with their mobile app and other things. It's just amazing data greatness out there. What's changed in the past four years? So for one thing, it is no longer a surprising or aggressive statement to say that data is important, and it was not so long ago, four years ago, that people thought Bitly was trivial, even though it was seeing billions of human interactions across the social web, right? It allows us for the first time to study human behavior at the scale of human communication, and yet that was a completely new idea in the industry four years ago. Now we take it for granted. So things have changed a lot. On the technical side, we've seen a lot of tools, tons of announcements this week that make this even more accessible than before. Lots of things for working on large amounts of data, real-time data, stream data, lots of great visualization tools. So I think it's growing up. Like we've come a long way in the last four years, and it's really exciting to see what's happening now. It's pretty intoxicating to see the environment right now the way it is data-driven. We got a book that we're giving away, Bill Schmarzo, a Cube alumni, who on that strata that you interviewed was called the Dean of Big Data, because he was doing the, O'Reilly was doing the MBA, data MBA back then, and teaching people what it was all about. Now it's kind of mainstream. But now it's a tsunami of investments. So you're essentially an entrepreneur in residence, or data science in residence, which was a pivot off that concept, is relatively new. We had no DJ Patel was at great luck. He's now at a startup called RelateIQ, Drew Conway at IA Ventures. This is real. I mean, now everyone's has data science, now VCs have it. What do you do? What do you do every day? It's a great question. What do you do every day? It's still evolving. We're still inventing it. But generally it is working with the portfolio companies and hearing what the common challenges are. And it turns out, every company I talk to, whether it's in the portfolio or not, has the same set of challenges where they know they should be using data in their decision-making process. They know they have this data, but there's still a friction there, right? So on the organization side, it's things like, I know I have to add a data capability. How do I hire these people? What do they actually do? Where do they sit? Who do they report to you? How do we evaluate their work? Where do we even find them? And how do we train them, given that we don't have anyone in the company that already does this stuff? And on the technical side, it's a lot of, here's what our infrastructure is like. We know we should have the capability to query across this data that we know we have. How do we get there from where we are today? And it's also really interesting to look at companies that were built before the last couple of years, where they're generally built on what we would consider traditional relational data stores. And they have to actually build something new in order to make that data really useful to them. So one of the things I wanna ask you, because one of the benefits of kind of stepping out of a job like Bit.ly and going in and working for VC, EIR has the same kind of freedom is that you get to, and then with the data science, your role, you get to one, get your hands on and be an advisory and be a steward to the startups and give them some advice, but also you get to take a step back and kind of watch the industry trees blow and think about things and look at maybe kind of think about new ideas. What are you thinking about? What is Hillary Mason pondering right now as you, when you're not active with the portfolio, what are you kind of looking at and what are you watching? What are you thinking about? What's going through your data science mind? I am really excited by the opportunity we have now to make data useful for people in the context in which we live our lives. And one of the things we see if you look at all its social media data, we always joke that when you join Bit.ly and look at the data set, you go through this emotional cycle, right? So at first you're really excited because the data is incredible. And then you get really depressed because you realize people are actually looking at pictures of kittens and celebrity gossip all day on the internet and that's the sum total of human culture right there. And then you come out the other side of this. You want traffic? Put kittens on your site. We got to get the cube logo. No, we did prove that there were actually more photos of dogs than cats shared but kittens get all the attention. But eventually you come out the other side of this and you realize it is the greatest theater ever constructed, right? It's an incredibly fascinating to be able to study this data. And so I'm particularly excited about things that make large complex sets of data easy for us to use to make better decisions. One example is this little wrist brand I have which is telling me how many steps I take and how well I sleep. It is actually giving me positive reinforcement for engaging in behaviors. Where the raw data is every second a couple of steps, right? That's really nothing interesting to look at. But what I can learn from it is fascinating and I think we're just at the edge of products that give us those abilities. So human society and human behavior now is measurable at an individual level in context to the world. Yes. That's what you're essentially looking at. And then it's not just about cats, it's about, okay, what new things could be introduced? Right, and can I understand something about the world as a whole from this data that helps me as an individual? It's an infinite intellectual playground data, isn't it? It's kind of like Shakespeare in a way. But I remember four years ago I asked you, what makes a great data scientist? And you said, well, there's a lot of things. It's sort of a mashup. You said data hacking. Well, actually that was the last one. But stats, math, programming, and data hacking, right? And you're talking about some of the challenges that your portfolio companies have and they're not unique to your portfolio companies, obviously, right? Every company's having these challenges of particular finding these people. And so you mentioned training. I don't know how you train people in data hacking. Maybe there's a way to do that and a structure to do that. So what are you seeing in terms of between now and then in terms of the propensity of people to train new data scientists? What's happening in universities? I know we had Mike Rappa on from the University of North Carolina and they got a great program, but they still seem a little bit too few and far between. What are you seeing is the trends there? Is it gonna be a glut at some point in 10 years of data scientist? Or is it still not enough? What's your take on all this? I don't think we'll ever have a glut of people who can make reasonable decisions off of data in a fairly scientific way. Good, because I'm always telling my kids, just get into data, whatever you do. If you like math, you like stats, get into data. So my definition from data science four years ago, I actually have added something to that definition over the last four years, which is a communication ability and domain knowledge. So now it's really about math and the ability to build these models. It's about code and engineering and actually if someone hands you a messy data set, a CSV, MySQL database, you can get something meaningful out of it and you can put your answers back into it. And the last piece is really the ability to understand a business problem. Go away and do an analysis with some data and then come back and explain the solution to whoever has that problem in a way that they can understand and make a good decision without having been involved in that whole analysis process. So I think at first it was easy to underestimate the importance of that communication storytelling ability, but now I find it's one of the hardest things to find in potential candidates. That's a great point. I mean, you've been an inspiration to a lot of people and part of it is that you communicate really well, you're enthusiastic about what you do. And I slip pictures of kittens. Why cats and dogs? Have you ever answered that question? Does the data tell us that? Well, we're a big fan since our regional conversation, but I got to ask you about some of the, I know you're a math geek as well, but there's an art and science. Now, it's the one of the things that came up here at the conference was data artistry. There's an art and science and the beautiful thing about data science and with the new advanced analytics that are coming out is that anyone can be that power data scientist. So more ease of use, like Clear Stories, got some solutions, Cermilla has a nice approach. We have other solutions out there. What's the art side? It's kind of like an art side of data as well. Oh, absolutely. I mean, there's math, art, and science. Talk about the more the creative side. Well, the art side is intrinsic to the problem area. So one thing I encounter a lot is I'll walk into an organization and somebody will be terrified that data science is gonna replace their job, right? And so the example I like to use in those situations is to say, you know, AB testing or similar mechanisms can tell you whether you should deploy A or B. It does not tell you what A or B should have been in the first place. And the real skill of somebody who is good at working with data is being able to ask the right question to know what is possible and what is worth your time and energy and what isn't. And so that creative process is actually one of the hardest things to find and to teach, but again, is absolutely necessary for someone to be successful. Now we were just at the Splunk Conference and they went public, big August Capital back company and they went huge. They started out doing parsing log files, right? So now they're a full-on platform. One of their customers said they liberated us, our ability to do more by surfacing, essentially dumping data into their solution and surfacing at least some nuggets to start on connecting the dots. Well, Splunk is a great tool and when you think about what it does, right, is it enables people to take something that would have been a lot of work and make it almost no work to make that data useful. And you can expand that metaphor out and say, you know, Microsoft Excel is the greatest data science tool ever created. It's so ubiquitous that no business professional can credibly claim not to be familiar with Excel. Most data science in the world is done in Excel. And so we need more tools like that, more tools that make it possible for people who don't necessarily have a stats or computer science background to access and actually start to play with their data. What startups do you like out there right now? I mean, I know you can be independent because you're not a partner at Excel. But what firms are you looking at out there saying, hey, I like what they're doing. I like how they're taking approach for the data in the app or the platform. Could be data center, could be consumer. What startups are you looking at that let out? There are a ton, but I'll start with my favorite data product of all time, which I have no affiliation or association with, which is the team that makes the iPhone app Dark Sky or forecast.io. What they do, and this is really important in New York City, is they take public weather data, they take your GPS location, and then they give you a micro forecast for where you are standing, and it says things like it will start raining in 10 minutes. It's incredibly actionable, useful data in your personal context. That's one of my favorite. Run for shelter, get the umbrella. Best part is this clear skies of boring tap here, and they show you this nasty storm somewhere in India. Right, you can use the storm spectator somewhere else in the world. We had Spotify on earlier, and they're using all open source Hadoop. They're big data driven. They're doing a ton of cool stuff in a very hard area, which is music. Music preferences are really difficult to model. All right, well, Hilary, thank you for coming on theCUBE. Really appreciate it. Let's stay in touch. Obviously, we love Excel, great firm, and great to see the role of a data scientist in residence basically means you get some freedom, get to play a little bit, but also be actively involved with some startups and companies that are growing. So really exciting role. I'm sure you'll pop out of that soon, have a startup somewhere, but that's usually the path. Give it a couple of years, maybe a year, or more, but I know some folks have stayed as an entrepreneur in residence for multiple years. It's fun, so it's a fun job, congratulations. This is theCUBE, we're live in Big Data NYC. John Furrier with Dave Vellante, with Data Scientist Hilary Mason. We'll be right back with our next guest after this short break.