 Live from Cambridge, Massachusetts, extracting the signal from the noise, it's the Q, covering the MIT Chief Data Officer and Information Quality Symposium. Now your hosts, Dave Vellante and Paul Gillan. Hi everybody, welcome back to Cambridge, Massachusetts. We're at MIT. Paul Gillan and myself are here for two days and we're really pleased to have Sandy Pentland on. He's the Director of MIT Media Labs Entrepreneurship Program, just coming off a keynote. Mr. Alex Sandy Pentland, thanks for coming to the queue. How did you get that name Sandy? My dad was named Alex too, so I had to get the diminutive, so Alexander turns into Xander or Sasha or Sandy, and then it's stuck. So we learned from your keynote today that like our mom said, hey, if every other kid jumps off the bridge, do you? And the answer should be yes. Why is that? Well, if your other friends are presumably as rational as you and have same sort of values as you, and if they're doing something that looks crazy, they must have a piece of information you don't, like maybe Godzilla is coming. The bridge is crumbling. And it really is time to get off. And so while it's used as a metaphor for doing irrational things, it actually shows that using your social context can be most rational, because it's a way of getting information that you don't otherwise have. So you broke down your talk to Chief Data Officers, new types of analysis, smarter organizations, smarter networks, and then really interesting new architectures. I don't know if we could sort of break those down. You talked about sort of networks, not individual nodes is really should be the focus to understand behavior. Can you unpack that a little bit? Well, it's a little bit like the bridge metaphor. You know, a lot of what we learn, a lot of our behavior comes from watching other people. We're not even conscious of it. But, you know, if everybody else starts, you know, wearing a certain sort of shoe or, or, you know, acting in a certain or using a phrase in business, like all these new sort of buzzphrases that come, you have to too, because it's to fit in. It means something. It's part of being high performance and being part of your group. But that's not in data analytics today. What they look at is just your personal properties, not what you're exposed to and the group that you're part of. So, they would look at the guy on the bridge and they say he's not going to jump because he doesn't have that information. But on the other hand, if all the other people who like him are making a different decision, he probably is going to jump. And your research has been, you dig into organizations and you've found the relationship between productivity and this type of analysis has been pretty substantial. Very substantial. It's often the largest single factor you can point to, both for within the organization and outside of the organization, dealing with customers. So, people focus on things like personality, history, various sort of training, things like that. What we find is compared to the pattern of interaction with other people. So, who do you talk to when and what situations? Those other factors are tiny. They're often a whole order of magnitude less important than just do you talk to all the people in your group? Do you talk outside of your group? Do you violate the org chart and talk to other people? If you do, you're almost certainly one of the high productivity, high innovation people. So, what impact does this have or what are the implications of this on organizations which historically have had strong hierarchies, reporting structures. All of these institutions that we evolved in the post-World War II era, is this working against their productivity? Well, what they did is they set some simple rules in that they could deal with and wrap their head around. But what we find is that those simple rules are exactly the opposite of what you need for innovation. Because really what they're doing is they're enforcing silos. They're enforcing atomization of the work and everybody talks about we need to be more fluid, we need to be more innovative, we need to be able to move faster and what that requires is better communication habits. And so what we find when we measure the communication habits is that that's exactly right. Better communication habits lead to more innovative organizations. What's really amazing is almost no organization does it. So people don't know, does everybody talk to everybody in this group? Do they talk outside of the group? There's no graphic, there's no visualization. And when you give a group a visualization of their pattern of organization, of communication, they change it and they become more innovative. They become more productive. I'm sure you're familiar with holocracy, this idea of doing away with organizational boundaries and sort of titles and everybody talks to everyone. Is that in your view a better way to structure an organization? I think that's too extreme, but it's headed in the right direction. I mean, so what we're talking, first of all, people try to do this without any data. So, you know, oh, everybody's the same. Well, everybody really isn't the same and how would you know if you're behaving as well as the same as other people or I mean, there's no data. So what I'm suggesting is something that's sort of halfway between the two. Yeah, you can have leaders, you can have organization in there, but you also have to have good flow of ideas. And what that means is you have to make talking outside your org chart a value. It's something you're rewarded for. It means that including everybody in the loop in your organization is something you ought to be rewarded for. And of course, that requires data. So the sorts of things we do with people is we make displays, it could just be a piece of paper that shows the patterns of communication. And we give it to everybody. And you know what? People actually know what to do with it when you give it to them. They say, well, gee, you know, this group of people is all talking to each other, but they're not talking to that group. Maybe they ought to talk to each other. It's that simple, but in the lack of data, you can't do it. So you instrumented people, essentially, with badges. And then you could measure conversations at the water cooler, their frequency, their duration. No, not the content. Not the content. Just the activity. Just is it happening, right? And is it happening between groups? Does people from this group go to that other group's water cooler? Stuff like that. And that actually is enough to really make a substantial difference in the corporation. And you gave an example of you were able to predict trending stories on Twitter better than the internal mechanism at Twitter. Did I understand that correctly? Yeah. So what we've done by studying organizations like this and coming up with these sort of rules of how people behave, sort of the notion that people learn from each other and that it's the patterns of communication that matter, you can encode that along with machine learning and suddenly you get something that looks like machine learning, but in many ways it's more powerful and more reliable. And so we have a spin out called Endor. And what that does is it lets your average guy who can use a spreadsheet do something that's really competitive with the best machine learning groups in the world. And that's pretty exciting. Because everybody has these reams of data, but what they don't have is a whole bunch of PhDs who can study it for six months and come up with a machine learning algorithm to do it. They have a bunch of guys that are smart, know the business, but they don't know the machine learning. So what Endor does is supply something like a spreadsheet to be able to allow the normal guy to do as good as the machine learning guys. There's a lot of focus right now on anticipating predicting customer behavior better. A lot of it has been focused on individuals, understanding individuals better. Is that wrong headed? I mean, should marketers be looking more at this group theory and treating customers more as buckets of similar behaviors? It's not buckets, but treating people as individuals is a mistake. Because while people do have individual preferences, most of those preferences are learned from other people. It's keeping up with the Jones, it's fitting in, it's learning what the best practice is. So you can predict people better from the company they keep than you can from their demographics, always. Virtually every single time you can do better from the company they keep than from the standard data. So what that means is when you do analysis, you need to look at the relationships between people. And at one level it's sort of obvious. You wouldn't analyze somebody personally without knowing something about their relationships, right? About the type of things they do, the places they go, those are important, but they're usually not in the data. And what I find is I do this with a lot of big organizations and what I find is you look at their data analytics, it's all based on individuals and it's not based on the context of those individuals. Absolutely, I wanted to ask you further about that because when I think of the surveys that I fill out, they're always about my personal preference, what do I want to do? I can't remember ever filling out a survey that asked me about what my peer group does. Are you saying that those are the questions we should be asking? Exactly right, and of course you want to get data about that, you want to know if you go to these locations all the time, you go to that restaurant, you go to this sort of entertainment, who else goes there? What are they by? What's trending in your group? Because it's not the general population. And it's not the sort of people I know, but they're people I identify with perhaps. So I go to certain restaurants, not because my friends go there, but because people who I aspire to be like, go there. Yeah, and the other way around, you go there and you say, well gosh, these other people are like me because they go here too. And I see that they're wearing different sort of clothes or they're buying, or the simplest thing, you go to a restaurant, you see other people all buying the mushi. You say, well gosh, maybe I should try the mushi, I usually don't like it, but seems to work well and this is, I like this restaurant and everybody else who comes here likes it, so I'll try it. It's that simple. So it's important to point out we're talking about the predictive analytics components. People watching might say, Sandy's crazy, we mean we don't want to personalize. We want to personalize the customer experience still, I'm presuming. But when we're talking about predictive analytics, you're saying the community, the peer group is a much better predictor than the individual. That's right, yeah. Okay, so I want to come back to the org chart piece. Are you saying that org charts shouldn't necessarily change, but the incentives should or are you prescribing? The easiest thing to do is you have an org chart, but the incentives that are across the entire organization is good communication within the box that you're in and good communication outside of the box. And to put those incentives in place, you need to have data. You need to be able to have some way of estimating, does everybody talk to each other? Do they talk to the rest of the organization? And there's a variety of ways you can do that. We do it with little badges, we do it by analyzing phone call data. Email is not so good, because email's not really a social relationship, it's just this little formal thing you do often. But by using things like the badges, like the phone calls, surveys for that matter, right? You can give people feedback about are they communicating in the right way? Are they communicating with other parts of the organization? And by visualizing that to people, they'll begin to do the right thing. You had this notion of network tuning. So you don't want an insufficiently diverse network, but you don't want a network that's too dense. You want to find the sweet spot in the middle. How do you actually implement that tuning? Well, the first thing is that you have to measure it. You have to know how dense is the social interaction, the communication pattern, because if you don't know that, there's nothing to tune, right? And then what you want to ask is you want to ask, the signal property of something being too dense is the same ideas go around and around and around. So you look at the graph that you get from this data and you ask, does Joe talk to Bob, talk to Mary, talk to Joe, is it full of cycles like that? And if it's too full of cycles, then that's a problem, right? Because it's the same people talking to each other, same ideas going around and around. And there's some nice mathematical formulas for measuring it. That's sort of hard to put into English. But it has to do with, if you look at the flow of ideas, are you getting a sufficiently diverse set of ideas coming to you? Or is it just the same people all talking to each other? So are you sort of cut off from the rest of the world? In your book, Social Physics, you talk about rewards and incentive mechanisms. And one of the things that struck me is you say that rewards, that people are actually more motivated by rewards for others than for themselves. Correct me if I'm paraphrasing you wrong there. But rewarding the group or doing something good for somebody else is actually a powerful incentive. Is that the truth of the case? Well, you said it almost right. So if you want to change behavior, these social incentives are more powerful than financial incentives. So if you have everybody in a group, let's say, and people are rewarded by the behavior of the other people in the group, what will they do? Well, they'll talk to the other people about doing the right thing, because their reward, my reward depends on your behavior. So I'm going to talk to you about it, okay? And your reward depends on his. So you'll talk and on and on. So what we're doing is we're creating much more communication around this problem and social pressure. Because if you don't do it, you're screwing me. And I may not be a big thing, but you're going to think twice about that. Whereas some small financial award, usually it's not such a big thing for people. So people talk a lot about persona, persona marketing. When I first met John Furrier, he had this idea of affinity rank, which was his version of peer group page rank. Do you hear a lot about, get a lot of questions about persona, persona marketing, and what does your research show in terms of how we should be appealing to that persona? So I get questions about that sometime. And I don't know what he really originally intended, but the way people often apply it is very static. You have a particular persona that's fixed for all parts of your life. Well, that's not true. I mean, you could be a baseball coach for your kid and a banker during the day and a member of a church. And those are three different personas. And what defines those personas? It's the group that you interact with. It's the people you learn with and try and fit in. So your persona is a variable thing. And the thing that's the key to it is what are the groups that you're interacting with? So if I analyzed your groups of interactions, I'd see three different clusters. I'd see the baseball one, I'd see the banking one, I'd see the church group one. And then I would know that you have three personas and I could tell which one you're in, typically by seeing who you're spending time with right now, right? Is the risk of applying this idea of behaviors influenced by groups, is there the risk of falling over into profiling and essentially treating people, anticipating behaviors based upon characteristics? It may not be indicative of how any individual might act. Bad credit, alcoholics, as you give examples. I don't get a job because people who are similar to me tend to be alcoholics. So this is different, though. So this is not people who are similar to you. If you hang out with alcoholics all the time, then there really is a good odds that you're an alcoholic. And it may not be. Yes, and there is a risk of over-identifying or extrapolating, but it's different than people like me. I mean, if you go to the dingy bars where beers are a buck and everybody gets wasted, and you do that repeatedly. So you're talking about behaviors rather than characteristics. Behaviors rather than characteristics, right? I mean, if you drink a lot, maybe you drink a lot. So we have a question from the crowd. So it says real time makes persona very difficult. And so this was to come back to Furrier's premise, was always Twitter data, so it's changing very rapidly. So are there social platforms that you see that can inform in real time to help us sort of get a better understanding of persona and affinity, group affinity? Well, there are data sources that do that, right? So for instance, if I look at telephone data or credit card data even for that matter, whether it's geo-located, I can ask, well, what sort of people buy here or what sort of people are in this bar or restaurant? And I can look at their demographics and where they go to. I showed an example of that in San Francisco using data from San Francisco. So there is this data, which means that any app that's interested in it that has sufficient breadth, in other words, sufficient adoption can do these sorts of analyses. Can you give an example of how, you're working with the many organizations now, and I'm sure you can't name them, but can you give an example of how you're applying these principles practically now, whether it's in law enforcement or in consumer marketing? How are you putting these to work? Well, there's bunches of different things that go together with this view of, you know, it's the flow of ideas, it's the important thing, not the demographics. So you talk about behavior change and we're working with a small country to change their traffic safety by enrolling people in small groups where, you know, the benefit I get for driving, right, depends on your safety. And we're good buddies, we know that, that's how you sign up, sign up with your buddies. And what that means is I'm gonna talk to you about your driving if you're driving in a dangerous way. And that we've seen in small experiments is a lot more effective than giving you points on your driver's license or discount on your insurance, the social relationships matter. So that's an example. Another example is as we're beginning a project to look at unemployment. And what we see is, is that people who have a hard time getting re-employed don't have diverse enough social networks. And it sounds kind of common sense, but they don't physically get out enough compared to the people that do get jobs. So what's the obvious thing? Well, you encourage them to get out more, you make it easier for them to get out more. So those are some examples. When you talk about healthcare, what you can do is you can say, well, look, you know, I don't know particular things you're doing, but based on the behavior that you show, right? And the behavior of the people you hang with, you may be at much higher risk of diabetes. And it's not any particular behavior. This is the way medical stuff is always pitched, is, you know, it's this behavior or that behavior, it'd be a combination of things, right? And so you're not really aware that you're doing anything bad, but if all your buds are at risk of it, then you probably are too, because you're probably doing a lot of the same sort of behaviors. And medicine is a place where people are willing to give up some of the privacy because the consequences are so important. So we're looking at people who are interested in personalized medicine and are willing to, you know, share their data about where they go and what they spend time doing in order to get statistics back from the people they spend time with about what are the risk factors they pick up from the people around them and the behaviors they engage in. You messaged this to the CDOs today, was, you know, you were sort of joking. You're measuring that, right? And a lot of times they weren't a lot of the non-intuitive things that your research has found. So I want to talk about the data and access to the data and how the CDO can, you know, affect change in their organization. A lot of the data lives in silos. I mean, you certainly think of social data, Facebook, LinkedIn, Twitter, you mentioned credit card data. Is that a problem or is data becoming more accessible through APIs or is it still just sort of a battle to get that data architecture right? Well, it's a battle and in fact, actually it's a political and very passionate battle and it revolves around who controls the data and privacy is a big part of that. So one of the messages is that to be able to get really rich data sources, you have to engage with the customer a lot more. So people are more than willing, our research, we've set up, you know, entire cities where we've changed the rules and we've found that people are more than willing to volunteer very detailed personal data under two conditions. One is they have to know that it's safe so you're not reselling it, you're handling it in a secure way, it's not going to get out in some way. And the other is that they get value for it and they can see the value. So it's not spreading out and they're part of the discussion. So, you know, you want more personalized medicine, people are willing to share, right? Because it's important to them. Or for their family, you know, if you want to share, we're willing to share very personal stuff about their kids, they would never do that. But if it results in the kid getting a better education, more opportunity, yeah, they're absolutely willing to do that. So that leads to a great segue into Enigma. You talked about Enigma as a potential security layer for the internet, but also potential privacy. Yeah, solution. So talk about Enigma, where it's at, what it is, where it's at, and how it potentially could permeate. So we've been building architectures and working with this sort of problem, this conundrum basically. Data's in silos, people feel paranoid and probably correctly about their data leaking out, companies don't have access to data, don't know what to do with it. And a lot of it has to do with safe sharing, right? Another aspect of this problem is cybersecurity. You're getting an increasing amount of attacks on stuff, bad for companies, bad for people, it's just going to get worse. And we actually know what the answers to these things are. The answers are data is encrypted all the time everywhere. You do the computation on encrypted data, you never transmitted, you never unencrypted to be able to do things. We also know that in terms of control of the data it's possible to build fairly simple permission mechanisms so that the computer just won't share it in the wrong places. And if it does, skyrockets go up and the cops come. You can build systems like that today, but the part that's never been able, never allowed that to happen is you need to keep track of a lot of things in a way that's not hackable. You need to know that somebody doesn't just short-circuit it or take it out the back. And what's interesting is the mechanisms that are in Bitcoin give you exactly that power. So whatever you feel about Bitcoin, you know, it's speculative bubble, whatever, the blockchain which is part of it is this open ledger that is unhackable and has the following characteristic. It's amazing, it's called trustless. What that means is you can work with a bunch of crooks and still know that the ledger that you're keeping is correct because it doesn't require trusting people to work with them. It's something where everybody has to agree to be able to get things and it works. It works in Bitcoin at scale over the whole world. And so what we've done is adapted that technology to be able to build a system called Enigma which takes data in an encrypted form, computes on it in an encrypted form, transmits it according to the person's permissions and only that way in an encrypted form. And, you know, it provides this layer of security and privacy that we've never had before. There have been some projects that come close to this but, you know, we're pretty excited about this and what I think you're gonna see is you're going to see some of the big financial institutions trying to use it among themselves, some of the big logistics, some of the big medical things, trying to use it in hot spots where they have real problems but the hope is that it gets spread among the general population and it becomes quite literally the privacy and security level that the internet doesn't have. More above it might be right that it might fail as a currency but the technology has really inspired some new innovations. That's right. So it's essentially a distributed, it's not a walled garden, it's a distributed black box. That's right. That's what you're describing. You never expose the raw data. That's right. You don't need a trusted third party which can get hacked and that reduces your traffic. Yeah, so nobody has to stamp that this is correct because the moment you do that, first of all, other people are controlling you and the second thing is, is there a point of attack? So it gets rid of that trusted third party centralization, makes it distributed, you can have, again, a bunch of bad actors in the system, it doesn't hurt the reliability of the system. It's peer-to-peer where you have to have 51% of the people being bad before things really go bad. How do you solve the problem of performing calculations on encrypted data? These are classic techniques, actually. It's been known for over 20 years how to do that but there are two pieces missing. One piece is it wasn't efficient, it scaled really poorly and what we did is came up with a way of solving that by making it essentially multi-scale, so it's a distributed solution for this that brings the cost down to something that's linear in the number of elements which is a real change and the second is keeping track of all of this stuff in a way that's secure. It's fine to have an addition that's secure but if that isn't embedded in a whole system that's secure it doesn't do you any good and so that's where the blockchain comes in. It gives you this accounting mechanism for knowing which computations are being done, who has access to them, what the keys are, things like that. So Google Glass was sort of incubated in MIT Media Labs and well before Google, right, in your group and it didn't take off, maybe because it's just not cool and it looks kind of goofy, but now Enigma has a lot of potential, it's solving a huge problem. Are you going to open source it? What are you doing? Yeah, it's an open source system. We hope to get more people involved in it and right now we're looking for some test beds to show how well it works and make sure that all the things are dotted and crossed and so forth. And where can people learn more about it? Oh, go to enigma.media.mit.edu. All right, Sandy, we're way over our time so obviously you were interesting so thanks for coming on the tube. I really appreciate having you. My pleasure. Keep right there, everybody. Paul and I will be right back with our next guest, we're live from MIT. This is theCUBE, right back. Okay.