 Live from Boston, Massachusetts, it's theCUBE at the HP Vertica Big Data Conference 2014. Brought to you by HP with your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone. We are here live in Boston, Massachusetts for HP Vertica Big Data Conference. This is theCUBE, our flagship program. We go out to the events, extract the signal from the noise. I'm John Furrier with Dave Vellante, co-founder of Wikibon.org and we're here with Peter Fishman, director of analytics at Yammer last year, was on theCUBE, CUBE alumni back for second straight year. Welcome back. Round two. We had a great interview last year. I do remember some of the comments. I was kind of like, hey, what's going on with the integration with Yahoo? Apparently that, I mean, not Yahoo, Microsoft. Microsoft. It's hard to compare the two. One's worse than the other. But Microsoft obviously ingested you guys and integrated you into the organization. So let's get before we go into some of the cool data stuff, talk about the integration with Microsoft because David Sacks is no longer with the company. That was broken and also as news. And also the guys weren't mentioned on the Microsoft recent earnings report. Some were saying, what's that mean? Is it means it's just been integrated into other products or has it been taken out as a separate division? Yeah. Well, I'm a very proud Yammer employee. I have been at Yammer for about four years and we're still on our mission, which is essentially to define social within enterprise. And Yammer is now definitely a part of Microsoft. So geographically, we're not based in Redmond. We're not based in Greater Seattle. A vast majority of the Yammer team is still in San Francisco. We're right along Market Street. Really the heart of what we call the Silicon Bay. I would say that one of the reasons you might not be seeing the Yammer as a specific line call out in Microsoft's revenue is one, Microsoft makes a giant amount of revenue. And two, we sort of view ourselves as part of office. Our job is really to be a really valued part of office and especially the O365 experience. So I'd say that we kind of find ourselves really in the O365 world and the cloud component of Microsoft's enterprise offering. You should be proud to be in the CUBE alumni status of Satya Nattella, who's the CEO. He's been in the CUBE before at Stanford when we had the CUBE their day. Less views than you, Peter. Okay. Yeah. But his whole vision of cloud first is really cool. I like that. It was something we heard. We've been saying data first is kind of the next wave coming, you're going to hear mobile first, cloud first, and soon to be data first. But recently the trends are all about mobile and new user experiences. What have you seen? Because you guys were really the pioneers in this enterprise social trend. Facebook for the enterprise, some called it. But now you're seeing things like BuzzFeed, the $50 million in funding at an $800 million evaluation. Uber's blowing up like crazy. Completely disruptive market. So that social is now happening. Social business is now native in advertising and in collaboration. What's changed? What are you seeing? Well, I mean, I still think social has a ways to go. I don't think anybody has entirely figured out where social is going to be in the enterprise. So we still have a ways to go with our product in terms of being that true many to many communication that everybody uses within their company. Now, from a personal standpoint, I've really enjoyed the transition to Satya. I've, you know, he's come down to the Yammer office and given one of his monthly sort of talks to the entire Microsoft org right from Yammer headquarters. So that was very cool. That was probably a few months ago. What I really like is that he gets up in front of the entire team, fields, questions, nothing's rigged. All the questions come in on Yammer and he's able to really address it with a great vision. I think that, I think we're really aligned to Satya's vision of the world. And it also plays nicely for me because he's such a big believer in data. And I think that Microsoft is really making that move into the data space in a way that I hope is really exciting. So talk about the analytics piece. You're here, I'll see you for the big data event. We were talking last year about the analytics. Really important part of the ecosystem going forward. You're giving a talk here of data to dollars. What is that about? Give a preview of your talk. Yeah, I think there's many ways to view data. So one, it is an expense. So there's a lot of costs associated, not just hardware or software. There's costs in terms of people, in terms of providing data. And also opportunity costs, taking those people that could be doing something else creative and having them work on essentially the numbers. So people often focus on some of the costs of data and they wanna see, hey, how does data provide cost reductions to sort of counterbalance all of the other costs? Really, that's missing the point, which is we want the data to accelerate the product, accelerate the revenue streams for the product, such that we don't just get a return in terms of cost reductions, but also in terms of revenue enhancements. And what's great about, I guess a conference like this is that you see so many different industries represented. Myself, I'm coming from, before I was in essentially social enterprise, I worked in social games. And you couldn't imagine two more different worlds if you think about sort of the Farmville style application versus an enterprise software application. And seeing sort of data span that world all the way into mine has been actually rather interesting and you hear stories that are all sort of not uniform and so far as the applications are different. And also essentially the ways that you can affect revenue are different. But a lot of this sort of data war stories are very similar. So Peter, it's interesting to hear you talk about sort of the intense focus on cutting costs sort of misses the point and I would agree with that. At the same time, it is largely about economics, because you could now do things that you couldn't do before because it's just much more cost effective. So maybe the impetus was not to save money just to make money, but you wouldn't have been able to do it in the past. Is that a fair premise? Yeah, I think that, well, I think the entire economics of sort of the ROI to data needs to be revisited. Is what we're constantly seeing is, you know, storage costs and compute costs being driven further and further down. Which means that the types of data that you collect it used to be that maybe the marginal cost was so high to collect a certain amount of data that you couldn't actually get those returns on that data until you had sort of enough of it. And now if you think about the entire world of what we can collect, I mean, I'm not even talking about sensors, I'm talking about just a well-instrumented product can give you so much information and sort of the ability to wade through it is really critical, but the whole economics of data analysis, data science has totally changed with the cost being driven down and now that we're collecting things we sort of stumble into ways that the data can be useful. Last year you had a great quote on the cue we were talking about people remember, they look for what's working and what's not working and you mentioned something in the fact that focus on what's not working and a lot of people poker pros when forget, remember their hands they lost on and walk away from those versus the amateurs like the winners. I remember that part of that conversation. How has that changed with AB testing and things that we talked about last year in a year? Has anything come about in your mind from a trend standpoint? Technology and innovation, a new discovery. I mean certainly Spark has been interesting to see the innovation around in memory. Is there anything new coming out that you're seeing that's going to facilitate more of that kind of data for AB testing? Sure, so I think people think that experimentation is a panacea for solving all causality like issues. So it used to be that we would see things move together and there would be some correlation between these two variables and you would intuit that relationship and then the ability to do experiments basically solved the issue in some sense of correlation versus causation. So when we run experiments, if we properly randomly assign, for the most part we can distinguish between group A and group B and we can look whether there's an economic lift and whether there's a statistical lift but it actually hasn't fully solved this problem. First off, not always are your tests very cleanly implemented. So real practitioners know that things happen so your tests might not be as you intended it to be. And then you also sometimes run into the problem of small samples. So if you think about a product like Bing, the sample sizes are gigantic. So there I think it's a much more trivial task to get to statistical meaning. Even still, you're gonna wanna make sure that your random assignment actually looks like a fair random assignment. I mean some of the noise that you introduce is by not sort of, you might do the assignment randomly but that doesn't mean you get an unbiased sample. And you wanna make sure that the experiment is what's causing the differences between the two groups. So I think some of the interesting things that have come out in A-B testing have been around how do you really make the race a fair race in your experiment such that you're really drawing a causal inference that the thing that you changed between condition A and condition B or however many conditions you have is actually what's driving the difference between your two groups. So how do you apply that basic concept philosophy boundary conditions, if you will, in your situation with Yammer? Because you're servicing multiple enterprises. It's not just one big homogeneous group of people. So talk about how you use analytics and address that issue. I think we publicly discuss having over 10 million users in the enterprise and having near half a million companies and using the product. And if you think about those numbers, you can get to be able to detect really small UI changes as having a benefit or cost to the product. That said, actually sometimes you wanna really narrow in on maybe certain geographies or certain user types or certain network types. So what we're really interested in doing is figuring out how to, I think one of the themes that we talked about last year was cheaply doing analytics. It's sort of the theme that I try to drive with my team. Cheaply sharing analytics, but cheaply implementing an analysis that's able to tell you whether or not you've actually created a lift. So I'm thinking about what's the cheapest way to implement this test? If you're able to get away with smaller and smaller sample sizes, that essentially drives down the cost of doing experimentation. Okay, so it's kind of counterintuitive though, right? Everybody's talking about how sampling is dead, right? We're gonna, Colin talked this morning about infinite sample sizes. That's right. So you're saying that your analytics is a function of the sample size? Well, so I mean. Your analytics cost, I mean to say. So I wanna dive into this. I like the idea of sample size infinity, right? So when you have a giant N, then essentially your standard deviation is gonna be shrunk so that you can essentially make a better statistical inference. The catch is that you actually might want to, and when he says infinity, he means things that kind of approach infinity. And if you're trying to make the inference for a smaller and smaller and smaller population, you're sort of trading off your giant N versus making an inference for a subpopulation. And if you thought about sort of the best case scenario, you'd be designing things at the individual level and testing it on every single individual. But there you'd only have a sample size of one so you wouldn't know if you were really actually affecting a good change. So when we say like, you know, are samples blowing up? Absolutely. Are we collecting more data? Absolutely. Do products like Yammer have giant user bases through virality and through some of the Microsoft distribution channels? Absolutely. But then you get greedy and you want to go even more deep and define at even a lower level where is your product change effective and for what groups and for what segments. And that's a situation where your marginal costs don't go to zero. Sure, absolutely. Right, exactly. Your costs are proportional to the number of individuals that you have to profile. So now if you want to segment up your populations, then essentially you're shrinking your sample sizes. So how about, can you share with us, Peter, something that maybe surprised you in your analytics, you know, recent analytics career? You know, I think with respect to Yammer, we do a lot of what I hope is fun, boring work. Right, so we run experiments and we call the winners of the experiment based on knowing what our North Star is. So we know exactly what we think moves the needle for the company and when we run an experiment, we evaluate many metrics, but that aggregates up into a recommendation. So that in and of itself is not routine because it is our sort of day-to-day lives, but it's not sexy and exciting other than, you know, when something wins, it's potentially interesting and you want to figure out what was the driver of that win. And when something loses, it's potentially interesting because you want to, you know, have a sharp opinion as to what the direction the product needs to go with the learning essentially from that loss. So it might be cool and surprising to know just that the number of losses is always more dramatic than you expect, right? So if you ask, you know, a product manager, you know, what's the probability that this feature is going to be successful? Ex-Anti, it's almost always 100%. So I want to get into predictive analytics, but I want to, before we get to that, I want to ask you some trending questions. I have a lot of folks interested in this and Silicon Valley, the hot topics right now is, you know, depending on what form you go to is, can product guys be good CEOs? And should CEOs come from a product background? And the other topic I want to talk about is growth hacking. Growth hacking is something that's been part of the consumerization trend, very data-driven, very analytic driven in terms of, you know, A-B testing, all kinds of A-B, C-D, E-F-G testing. Product CEOs, do you think that CEOs in this day and era with full stack or however you want to categorize the kind of startups and growing companies, should come from a product background? So I think it's a very relevant question to be asking someone from Yammer. Again, you mentioned our former CEO, David Sacks, real product visionary. David really sort of saw the social movement and what it was going to become in 2008. And as a result, you know, we have Yammer today. So in thinking a little bit about kind of what a CEO might look like, David argues that, you know, a CEO has to, you know, be a product person who's really a dictator. And ideally, you have a benevolent dictator. And it's one of sort of his more recent tweets. Is he a benevolent dictator? Well, you know, David's a little great. Yeah, we had a great working relationship. And I think what's interesting was that I would not have guessed at first that he would be so receptive to having such a valuable analytics team because he's a person with a real vision for what the product should be. And David really embraced analytics along with our co-founder, Adam Pizzoni. So I think of them as very benevolent dictators, although, you know- What makes a great product guy? Product person or VP of product or head of product? So David makes the point that the product person has to say no. So it's very, you know, it's very common that you hear multiple opinions and you wanna do the easiest thing, which is, you know, people with, you know, great social skills know that compromise is absolutely critical. So what you wanna do is essentially take the, take the sort of middle ground. Where is that everyone can be slightly unhappy? And David argues that this can be product death, right? So- What, by appeasing people? By appeasing people, then you're essentially cutting corners and making a compromise on the product. And in many senses, by not doing that, you kill sort of working capital. You kill sort of your social capital in the workplace. And in doing so, you end up with, you know, a person who's sort of forced into the, you know, in maintaining that great product, you're forcing yourself into the corner of being sort of- Pissing people off. Maybe. Yeah, yeah, that's the way to put it. No one likes a no-op. Literally, no-op in the sense of the no, I'm saying no all the time. What I hope is that analytics can mitigate that to some extent, right? So, yes, you're driven by the big vision of someone that has that big vision, but sort of all of the mini details can get sorted out by the users. Not by the cacophony of opinions that everybody's adding to any specific product feature, but by your millions of users and knowing kind of what metrics indicate that they've actually had an improved experience. So what I hope is that, yes, that might be the reality of having a really strong opinion around product might be necessary to build that great product, but sort of to avoid a really harsh dictatorship, I think you need analytics in place. Yeah, I mean, that's a big debate I see that all the time. It's looking at hacker news, certainly in all the forums. I'm a big believer in product, I being the CEO. I'm a product person in general, but at some point, operationally, you have to scale. Look at Steve Jobs, Tim Cook's relationship. At some point, the operational machinery needs to print the money. But till you get there, I think product, having an eye on the product is critical. Growth hacking, that's a big thing that startups do, and a lot of times it backfires. I mean, how many times have you gotten an email? Oh, my friends joined the social network. Can you join too? I mean, LinkedIn first started growth hacking with the network effect, with sucking your emails. Now it's sucking your contact address book on mobile, creating these networks. What's your take on growth hacking and the role of data? Yeah, certainly great title. I'm a growth hacker, and I don't know who... First of all, I see that on someone's business card. I definitely don't hire them on the spot. Good growth hackers, don't put it on their business cards. Come on. It's funny, it's... It's like I'm a social media expert. Come on, what the hell does that mean? Well, I would love to coin myself a social media expert and a growth hacker. I think a lot of roles require that. I'm a data geek first and foremost, but essentially all of those skills that go into understanding the social space and understanding essentially how a company can grow. But what is growth hacking? You define growth hacking. Yeah, so I think it sort of just literally means doing any and everything to have that product grow at a exponential or quasi-exponential rate. Now, in thinking about sort of some famous growth hackers, I mean, Jamath, who was Facebook's head of growth hacking if you will, head of growth, was an investor in Yammer and we tried to essentially bring some of the best practices of our predecessors in the social space. As I mentioned, I came from social games. Social games sort of wrote the book on sort of the down and dirty growth hacking, which is to say doing whatever it takes, having viral loops within the games, having viral prompts within the games, everything was about is this product viral and whatever that means. It means basically does it have a K factor close to one? So if you can, I think the willingness to do all sorts of creative things to get your K factor high is what I think of as growth hacking. What are some failed growth hacks that you can point to in the market that you say that was a fail and then some successes? Well, I mean, you sort of always have this trade-off in growth hacking between doing something that makes your product a little bit cheesy and maybe unusable and undesirable versus growing more rapidly. So you could imagine a world where we're constantly popping something in your face. Hey, invite this colleague, invite that colleague. Hey, you can only post a message if you invite 10 more friends or you post it to your social media feed or something like that. And those types of techniques seem to me to work in the short run. So if you're under extreme pressure to blow up your application, you could imagine any of those techniques that were really prevalent at the end of the last decade for blowing up some social games could really imagine applying in almost any industry. Let me give you an example of a potentially growth hack on fail. So I'm looking at Facebook right now on my laptop and I was just this weekend on Facebook saying in my social graph, hey, looking for a place in December to bring my family. Looking for a beach, here's a criteria, crowdsourced, a bunch of answers. And one of the questions was, hey, go to Hawaii, which you've been to many times. So I hit the Sheridan in Maui website to check rates. And now I can't get this fricking retargeted ad off my Facebook page. And I'm so pissed off at Sheridan right now. I'm just looking at this image. I'm like, I'm done with you. So it's the point now where it's so intrusive and making me so angry. It's retargeting, but that's what they're doing. They're doing the retargeting. What's wrong with that analytics? Certainly it's a fail for Sheridan because now my brand equity to Sheridan is going in the toilet because I'm looking at their ad knowing that they're pissing me off. I'm not interested. Yeah, so I think that that just revisits my initial claim, which is a short run, long run trade off. So you getting stocked by Sheridan actually does probably increase the likelihood of you booking it this time. You happen to be annoyed by it, so it's hurting the Sheridan brand, right? So this is a growth hack that has a short run versus long run trade off. In your case, it's actually costing them in both dimensions. Statistically it probably works for them with the retargeting that they'll get more conversion on site, but for me as an individual, I'm turned off. Exactly, so balancing that, one of the things that we talk a lot about at Yammer is that ultimately, product decisions are made by the product team. So often, you know, we're at a big data conference and we talk about A-B testing and we talk about kind of how you can use experiments to figure out what's working and what's not working in your product. But that should all be taken with a grain of salt, right? Which is to say, we might be striving for a local max. And I think this is one of the common criticisms of analytics, which is, you know, if you think about a variety of product space, and if you think about kind of measuring awesomeness, we should be great at measuring short run awesomeness. And in many senses, what an analytics team does is tell you with high degree of precision, what's the difference between condition A and condition B? What it doesn't do is tell you whether or not there's a condition C out there. So we're great at telling you about our local maxes. We're great at telling you about some short run effects. But ultimately, our product decisions remain with our product team. We're not like sort of machines in terms of, well, we just got all the data and it says up, therefore we now release it. That's not at all how we do. We try to maximize our learnings from any experiment. The point of the experiment is not just to call a winner, it's to learn. And it's for the product team to know what we think the economic and statistical lift is of this particular product choice. So when the product team owns the ultimate decision, they're the ones with the deepest view into product, into the product roadmap, into the product vision, so that you know that you're not sort of settling on a local max. Peter Fishman, analytics director at Yammer, guru in the industry, you're here talking on site here at the Vertica Big Data in Boston. Messers were live, this is theCUBE. Go to crowdchat.net and check out our new engagement container technology. We're hosting the conversation there on the hashtag HP Big Data 2014. Go there and join the conversation. You've got to log in, Oauthentic, LinkedIn, Twitter, Facebook, check out the new product, crowdchat.net slash HP Big Data 2014. That's the hashtag, our new social media innovation. Final question, I'll give you the final word. Tell the folks out there in your own words that are watching. We have thousands of people online right now watching. Tell them in your own words, why is this Big Data movement around analytics such a game changer for startups, big companies and ultimately for society? Yeah, again, it goes back to the theme of my talk on the show last year, which is it's about cheapness and it's about distributing information really inexpensively. And when we do that, we basically get all of the opinions aggregated and that there is a total real incredible value to aggregating all these opinions and for people to have a really deeper understanding of it and that you're not sort of relying on somebody with just incredibly deep expertise that instead you can actually know what's happening in your product when you're making those decisions. Peter, thanks for joining us on theCUBE. Great to have your perspective and you guys are pioneers and you're certainly in the forefront of an amazing industry. I think obviously data science, well-documented now Wall Street Journal article this past week about the future of data science careers. It's good validation for me. I sent all my kids the like I told them two years ago, data science like that. You don't know what you're talking about. So I feel vindicated myself, but data is hot, you're a player. Thanks for joining theCUBE. We appreciate it. We'll be right back after the short break. This is theCUBE. Thanks guys.