 out. What I want to introduce now is our next speaker and I think the speaker will be a real treat. This gentleman started his career early career at Oracle, worked at Salesforce, worked at Zynga and has consistently been focused at looking at data. In fact I joke with him that he made data cool before data was cool as it is today. So we're talking about one of the original thinkers around that. And now he's at the small company, you may have heard him Facebook. May have heard of him? May have heard that company? Little f, you see every now and then. And not always at that company but he's gonna tell you the story about how he came to Facebook. A lot of people might assume well they started with analytics. That was their first hire, right? Mark got some money and the first person he hired was let's get some analytics because that's just obvious. Well I think you'll hear the stories a little different but most importantly he is gonna share insight about not just how to do analytics but how to make an impact with analytics. So we can move to the next slide I'd like to introduce on the stage Ken Rudin, head of analytics at Facebook. Please welcome Jack. Hi good morning I'm Ken Rudin and I run the analytics organization at Facebook and we've had quite a long journey over the last several years of figuring out how to do analytics. We've always had lots of data but there was a period of time when we didn't exactly know how to use it well to really drive impact in the company. And over the last three or four years we've really focused on building a culture around that. And we've thought long and hard about how we approach data at Facebook and I've talked to a lot of people in the industry and my conclusion is that there are a lot of commonly held beliefs, myths if you will, that I think really need to be challenged. And I'd like to go through four of those today. The first myth is myth that if you want to build a big data system you have to use data to do. That is way too narrow of a perspective. The second one is that big data provides better answers. It's not always true. Third one is that data science is a science. If you only focus on the science you're going to fail. And the last one is that actual insights is why we're all here. Actual insights is the goal for big data analytics or any analytics for that matter. And that's just plain wrong. So let me start with the first one which is that there's this belief that big data is synonymous with Hadoop. And there are a lot of companies who are implementing Hadoop right now and many of them are thinking about turning off some of their relational systems once they've done that. The problem with this way of thinking is that Hadoop is a technology and big data is not about technology. Big data is really about business needs. It's about the need to have deeper analysis using a lot more data. It's about the need to do different types of analyses using algorithms and approaches where SQL may not be the best approach. And it's about the need to have faster iteration without being constrained by a fixed schema. And when you think about big data from the perspective of business needs instead of technology, then it opens up the possibility of using a much broader range of technologies. So the reality is that big data should include Hadoop. And it should include relational. And it should include in-memory systems. And it should include anything else that is the right task for whatever job you're trying to focus on. At Facebook we're a young enough company that when we first built our data systems we started with Hadoop. But over time we started realizing that Hadoop is not always the best solution to whatever it is we're trying to focus on. We were using it incorrectly in many ways and there were much better tools out there for specific parts of what we were doing. And it turns out that using the wrong tool was pretty painful. So whereas most companies are taking relational right now and then are adding Hadoop onto it to do big data, we're in the position where we started with Hadoop and now we're extending it with relational to do big data better. So the question then comes up as to, well, when do you use these different technologies? And I'll just look at two of them here, which is how we use Hadoop and how we use relational and where they fit. And the thing that we've noticed is that, and the thing that we focus on is there's really different uses for each. And when we look at the types of analyses that we do with each, we use Hadoop to do exploratory analysis to look through massive amounts of data to find out what are the metrics that really matter for us. And we do that because we need Hadoop's ability to go through mountains and mountains of data to find the gems within it that are really important to the business. But once we figure out what those metrics are, we move that data into a relational system because it's easier for us to use that to do ad hoc analysis and to slice the data by multiple different dimensions and to do historical analysis. If we look at the granularity of data, there's also a big difference there. We use Hadoop to store massive amounts of data at our lowest level of granularity because for now, Hadoop is the only system we've been able to find that can store the hundreds and hundreds and hundreds of petabytes that we've got. But most of the analysis we do isn't on all of the data. It's going to be on some aggregated, some summarized portion of that data. And so we'll take that data and we put that into a relational system, again, because it's just much easier to do the analysis in that environment. And then the last thing when we look at the time frames of the data that we have in each of these systems, most of our data come streaming in directly into our Hadoop system. So we tend to use that system for if we want to do things like monitoring, hourly monitoring of what's going on with the user population. We'll do that in Hadoop because that's where the data is landing first. But it does get processed. It does get summarized. And when that happens, then we'll put that into a relational system. And that's where we tend to do our longer term trending analysis, again, because it's an easier tool for that kind of job. So the bottom line is figure out what technology is best for whatever task you're focused on. It may be Hadoop. It may be relational. It may be something else. So this brings me to my second myth here, which is that big data provides better answers. And that's why everybody's implementing big data in the first place. And it is true that big data will give you more answers and deeper insights on more data. But the problem is that's really not the focus in where the value comes from. Because coming up with brilliant answers to questions that no one cares about doesn't add any value. So you've got to figure out where does the value really come from. And as it turns out, it's fairly straightforward that often better answers comes from figuring out how to ask better questions in the first place. So that raises the issue of, well, how do you get to that? How do I figure out how to ask better questions? And there's three different approaches that we have found that work for us. One of them is focused on hiring the right people. Data science is something that has become very, very popular recently, and it's often associated with you need to hire people who have got PhDs in statistics. And that's what we used to do. But what we've learned along the way is that that's no longer sufficient. You can't just hire a bunch of PhDs in statistics. They also have to have business savvy. Business savvy is quickly becoming one of the most critical skills for a data scientist to have. And unfortunately, it's also becoming one of the hardest to find. How do you figure out if a potential data scientist you're talking to has business savvy? Well, I think the easiest way to do it is when you're talking to them, when you're interviewing them, instead of presenting them the typical type of question, which is, I've got this data. How would you calculate this metric to see if they know how to do it, how the processes work for that? Start with, here's a business problem. What metrics do you think you'd want to look at to figure out what's going on here? And then once they give you that answer, then say, okay, next step is how would you calculate those metrics? So check to see if they've got the business savvy. To make sure that they can ask the right questions, that they can come up with the right hypotheses, and then they can use all that data to see if they can get the right answer. The second thing we focus on is training everyone, not just analysts in your company, but everyone they work with. The goal isn't to convert everyone in your company to an analyst. The goal is really to make sure that everyone has an understanding of how they can use data to do their job better, to have more impact in whatever role they have. And in my company, one thing we implemented about two years ago is this thing that we call data camp. And it's an intensive, immersive two-week program all day, effectively all night as well, where we immerse new hires into this program to learn about data. And it's not just for analysts. Engineers all go to data camp, and product managers, and people in marketing, and finance, and people in operations, they all go to this two-week data camp. And the result is that when everyone comes out, they have this common language about how to use data, and they all can understand how data helps whatever role they're in. And one thing that's interesting, if you look at the curriculum for data camp, only about 50 percent of it is really focused on how do you use the tools that we've got within Facebook to actually do your job and get answers. The other 50 percent of it is how do you frame questions in such a way that you can answer them with data? And how do you build a more data-driven type of team that you're working in? The third thing that we focus on is org structure. And this one turned out to be really important for us, and also took a few iterations before we got it right. You need to make sure that your analysts are in a structure where they understand what's going on in the business and are as close to it as possible so that they can ask the right questions. They know what questions they should be asking. There tend to be two traditional models for structuring analysts within a company. The first one that I've shown here is the centralized model, where you've got all the analysts in one organization rolling up into some analysts lead, and then they're working with different business organizations. The advantage of this model is that because everyone is centralized, it's a lot easier to come up with common standards and processes and definitions and so on so that you can have a nice structured analytical environment within your company. The downside of this is that because they are centralized, they are a bit disconnected from the business and the business needs. So what ends up happening is they tend to be really reactive as a business function within your company. And the role tends to be defined as they're responsible for responding to requests that come in from other areas of the company. A second pretty common model that you see is the decentralized model, where here you've got the analysts as part of the business organization itself, rolling up into the individual business leads. And the advantage of this model is now you've got really good alignment between the analysts and the people in the business that they're working with. And so what ends up happening here is they can be a lot more proactive now, because they understand the challenges, they understand the issues of whatever is going on in that business, so they can figure out what are the right questions we should be focusing on in the first place before they run off and start going through mountains of data trying to find answers that may be meaningless. The downside of this is because you have different organizations, they each tend to come up with their own set of standards and processes. And also you tend to find that each of them tries to somewhat redundantly solve the same set of problems, because it's their own group and they're not communicating well with the other organizations. So if the centralized doesn't work all that well and the decentralized doesn't work all that well, what do you do? And so the model that I really think is the right model is what we call the embedded model. Essentially it's a hybrid of both. Organizationally it's a centralized model where all of the analysts roll up into one analyst lead. But when you look at it physically, the analysts are sitting geographically sitting with the actual teams they work with. So they're embedded in sitting side by side with the people in the business organizations that they're working with. So you get the best of both worlds here because they are part of that organization sitting there physically with them in the mix of the conversations going on. They get a really good understanding of what's going on in the business and they get a really good understanding of what are the issues that we need to be addressing. But because they're also still physically part of an organizationally part of a centralized team, you get this consistency and this standardization. Because if you don't have some centralized organization it tends to be chaotic and it tends to be pretty hard to see across the company. So I'm pretty well convinced that this is the right place to start if you want to have an analytic organization that has a lot of impact in your company. And for a lot of companies this is also a great place to end if you want to figure out how to organize your analysts. A few companies may decide that they're going to modify this a little bit and as I was showing here what we've done here is we've taken the analysts and we've decentralized them again and they are now actually part of the business organization. And companies may decide that they want to do this in those scenarios where the business organizations themselves already have their own functions, they have their own engineering, they have their own marketing, they have their own product management. It would make sense over time to make sure that they have their own analytics function too. And this model can work pretty well as long as two things are true. First that you start with the embedded model so that you build up the common set of processes and standards and so on that you build a pretty strong data driven culture that's unified across the company. And then the second thing is that you still have somebody in the role of an analytics lead that all the analysts are dotted line rolling up to. And this person's purpose is to make sure that the analytic organizations all stay in line. They all stay aligned and that they all maintain a consistent data driven culture across the company. So this brings me to the third myth which is that data science is a science. It's actually a lot more than that. I talked a little bit before about the relationship between better answers and better questions. And if you look at the better answers part, let's focus on that for a second. Well that is a science. You've got to be fluent in some database languages and some programming languages. You've got to understand statistics really well. You've got to master a whole bunch of technical tools. So that really is a science. But if you look at the part on better questions and you think back on the things we were just talking about about how you get your organization structured around asking better questions, well that really becomes much more of an art. So the reality is data science is a science and an art. And the trick is how do you balance those? How do you figure out when to use science and when to use art. And the way to do that finding the right balance here is what I refer to as balancing hippos and groundhogs. And the hippo scenario is a scenario where you've got two little science going on. The hippo scenario is when you're in a meeting and you've got, you're trying to make a decision and you've got the person who is the most senior person in the room pounding their fists on the table saying their opinion about what to do is right because they have the most experience and they've been at the company the longest. This is a scenario where people aren't using data. They're instead going on opinions and going on gut instinct. And I call this the hippo scenario because hippo stands for the highest paid person's opinion. So that's one way you can do it. The other way to do it or the other scenario is the groundhog scenario. And this is a scenario where you're using too much science. And I call this the groundhog scenario after the movie Groundhog Day. And I don't know if you guys have seen it. It's a classic movie. But in that movie, the lead character is named Phil and for some unknown reason he ends up reliving the same day over and over and over again. Every morning when he wakes up, everything is reset exactly to the way it was the day before and he relives that same day again. Well in the movie he's really interested in this woman Rita. And so he decides that he's going to ask her out in the morning and then they're going to go out on a date that night and they do. And it goes horribly and she ends up slapping him at the end of the day. But he realizes that wait a minute, I get to live this day again. So maybe I can change my behavior a little bit and make her like me a little bit more. So he tries again the next day and it doesn't work and she slaps him again. Each day he wakes up again and he says I'm going to change my behavior a little bit more and try something slightly different to see if I can make her like me. And he tries again and again and again. And the outcome is always the same. It never works. So after countless attempts at this, he finally decides that you know what? I give up. I'm not going to try and be who I think she wants me to be. I'm going to just be myself. And of course that's when she falls for him and the spell is broken and he no longer has to relive the same day over and over again. Now I know I sound a little bit like a data analyst who's trying to give relationship advice. But I bring this up for a reason, which is that from an analyst's perspective what Phil is doing is using A-B testing to try and fix a relationship. And that's not going to work. He's using too much science and he's not using enough soul. He's not using enough art. So you can have too little science and too little use of data. You can also have too much data and use it and overly use data instead of taking a step back and thinking about what should you really be doing here? You have to find the right balance. Another way of thinking about this is that if you have a good idea, data is like sandpaper. If you have a good idea, data can help you refine it, but it doesn't help you create the idea in the first place. That's where an understanding of your business comes in. That's where an understanding of the industry comes in and where intuition and art comes in. So ultimately when you think about it, data can't help you take a bad idea and make it into a good one, but it definitely can help you take a good idea and make it great. So let me focus on the last of these myths here, which is that actual insights is why we do analytics in the first place. And that just isn't true. It's not going far enough. So I'll take a step back and look at where we come from over the last two decades or so. Started off with reporting where people wanted to know, tell me what's going on. Okay, great, we can use our systems. We can tell you what's happening so you're informed. Then people realize that that's not enough. I need more than that. I need to find out why it's happening because if I understand what's happening, I can do a lot more about it. Okay, good, we paused there for a bit. And then people said, that's not good enough either. What I really need to know, the ultimate goal is, I don't want to just know why something happened, but I want to know what I should do about it. And that's where we are as an industry today, the actionable insights. And as I mentioned, that's still not far enough because an actionable insight that nobody actually acts on adds no value. You have to take it the last mile and you have to focus on making sure that people actually act on your insight and that it drives some impact. So before I go any further, I need to define what I mean by impact. And something with impact fits into one or more of these buckets usually. The first one is that I need to be able to move a metric, like driving user registration up by 5%. Or I need to be able to change a product, such as when we looked in our data and we realized that if I know Jonathan and you know Jonathan, I tend to know you. So then we can add a feature in the product that suggests people you may know within the product itself. Or it changes a process or a behavior like when you're using data to come up with a better process for forecasting revenue growth or forecasting user growth. So ultimately what this means is that you need to own the outcome. If nothing changes, you've made no impact. Our goal as analysts is to drive impact through data. Drive change through data. If nothing changes, you've added no value. Because if nothing changes, then it wouldn't have made a difference if you didn't even work at that company. And if it doesn't make a difference if you didn't even work at that company, how can you say you've added some value? You've got to go the last mile. You've got to evangelize what your insight is and make sure that it leads to a change in the business, that it leads to an impact. So ultimately, all of this, everything I've said is that we really just start to, we need to start thinking about big data differently. We need to start thinking about big data from the perspective of business needs, not just the technologies that people are using. We need to start focusing on asking the right questions and not just getting more and more answers from more and more data. We need to focus on how do we balance the science and the art of data science? And we need to make sure that at the end of the day, we're taking that last step. We're making sure that somebody is doing something with our insight and that it's driving a real impact. And the people who do this really well are driving huge change in their companies. They're driving a ton of impact in their companies. They're also helping to redefine the role of data and analysis in multiple industries. And as an analyst, that's an incredibly exciting role to have. So thank you very much. Is that for me? Oh no, there's plenty more for you. Well, thank you. A big round of applause for Ken. You know, that call to action that this isn't just about the data. It's not just about the analytics. It is about making a difference. It is about driving the impact. And I hope what you heard from Nate and Ken gives you a sense of the levers that you have in this room, the opportunities you have to really make a difference, not just within your organization, not just within IT or the data science group, but across the organization. It inspired me and I think it sets the stage for the journey that we're going to be going on. So with that, a couple key points. So you can start your journey today like yesterday, fully packed with a couple changes. Number one, apparently there was overwhelmingly huge demand for lunch. So I guess lunch is a good thing. And we've decided to put lunch in two places. So both in the place it was, in the Marina Ballroom, on the other side of the hotel yesterday, and also out in the foyer behind here. So for those of you that are into arbitrage and would like two lunches, go for it. But we want to make sure you're fed. We want to make sure that you seamlessly and painlessly get that lunch right away. The other point is that because of what I'm about to share with you, we want to grant you some more sleep. So we're going to start tomorrow. And I know this will really upset a lot of you. 30 minutes later. I know I know how upsetting that is. I know how great this is. But we will start at 9 o'clock. So push that alarm back. Oh, a round of applause for starting later. 9 30. Should we do 9 30? No, we'll do 9 o'clock tomorrow. So 9 o'clock to start tomorrow. Once just like yesterday, because it's so important, please take that extra moment. Give us that feedback in those sessions. We read that stuff. We act on it. Tell us what we're doing well. Tell us what we can do better. It makes a difference. Everyone counts. And you're going to win something or have a chance to win something. And the last thing is and I'm holding in here in my hand here, a very rare substance, very rare, only based in Boston. And it is known as. And I'm going to share two words with you. Breakfast beer, breakfast beer. And this is clearly symbolic of this evening. We have had a history. You have incredible celebration evenings. And that's what's going to happen tonight. It's going to be held in the Waterfront Pavilion. I'm going to get this right. That's to your left, my right. So to your left, just down the hallways, a large tent area right outside. It's an enclosed area. And that's where the celebration will be. And the theme of it is Boston. So you will have, maybe it's not breakfast beer. Call it what you want, but you'll have a lot of this. You will have cannolis, lobster, and everything else that makes Boston just an incredible town. That'll start at seven o'clock. It'll rock and roll to nine o'clock. And it's a great time, obviously, to network. But maybe more importantly, celebrate and have a great time. So with that, off you go to your journey in Big Data today. Thank you for being a great audience. Have a wonderful day. Thank you.