 Okay, we're back live here in Strata Conference. So we're here at siliconangle.tv, the cube, where we go out and talk to Horace Got The Signal. And we want to extract that from the noise and share that with you. This is our flagship telecast. I'm joined with my co-host Dave Vellante. And we're here with Mark Smith, who I met on Twitter because he had the epic tweet yesterday with some great data. You're a data junkie like us. Welcome to the cube. This is where we broadcast live. Step a little closer to the microphone so you can see you there. So Mark, tell us a couple of things. One, who you are, what you do, and some of the research you're working on. And we will jump into some of the commentary. Great. Well, thanks. It's good to be here. I'm Mark Smith. I am a sociologist. And I work with an organization called the Social Media Research Foundation. We're a not-for-profit at smrfoundation.org. Our motto is open tools, open data, and open scholarship. We're trying to be a browser for social networks. So we have released a tool. And it is called NodeXL, the network overview discovery and exploration add-in for Excel. It basically lets anybody who can now make a pie chart make a network diagram. And network diagrams are important because they are the shape of the social web. As I like and link and reply and mention to other people, I form a network with them. And this is a way of visualizing it. So last night, I tweeted about the data that we had collected, people who were tweeting about Stratocomps and the connections among them. And that image went out. And you can actually see something about the shape of the crowd. So our goal is to actually document how does social media really work. Well, I'm really impressed. First of all, I was on the list, which got my attention because it's the vanity of the bloggers. Seeds their name on the list. But seriously, this is something that we're really passionate about. We also believe Dave's firm, our research side of the business, is we don't charge for any of our content. It's all freak, data, knowledge. So we love your mission. We want to promote that. So we'll continue to promote your work. But more importantly, the social networks are about the future, the interactions, the collaboration. And Stratocomps is about big data. But there's all kinds of different conversations you heard us talking about, the business market, $50 billion. But there's really a change the world mentality going on here. You mentioned scholarship. We have a group we're developing called Silicon Academy. And you see Khan Academy, knowledge is free, data is free. But as the crowd consumes the data, it creates crowd behavior that can be measured. So what you're doing is doing that. So what kind of measurement are you seeing that you're able to do with the big data? And how are you creating it so that you don't have to be a PhD to parse through all that? That's very much part of our mission. We recognize that if you are a coder, there are a lot of really great tools out there for getting a network, analyzing a network, visualizing it. And if you're not a coder, there aren't. And I'm not a coder. And so I am interested in helping those people who can currently make a pie chart get to a social network data set, analyze it, visualize it, find an insight in it, and communicate it to someone else. And to do that with, at the moment, a completely free and open tool that we think will catalyze scholarship. Today we have seen, 2011 was the year of the crowd, we see the power of groups of people coming together. More people are gathering online than ever gather in any of the public squares around the planet. We want to see those crowds, and if you go to our website at nodexlgraphgallery.org, you will actually see snapshots, pictures, we would argue, pictures of the crowd. But these are crowds that are gathering in cyberspace. And so we think that documenting the variety of crowds. What are the different shapes of crowds that can form? And we have found differences in the shapes of these crowds. How many people are on your team? There are about 14 people associated with the Social Media Research Foundation. And who are you guys based out of? Belmont, California is our headquarters, but we are very virtual. We have attracted remarkable talent from around the planet. So we have people at the Oxford Internet Institute, like Bernie Hogan, or main leaders of the information visualization world, like Ben Schneiderman, Professor of Computer Science at the University of Maryland. Many talented people have gathered under the banner of the Social Media Research Foundation, all making contributions to make these tools and the data available. So I've got a graph up from nodexlgraphgallery.org. Tell us what we're looking at here. Mark, I don't know if you can bring, if you had a chance of bringing that up. I'm just sharing with Mark the first thumbnail. Right. So I think this one is the map of the word Big Data being used in Twitter. So if your viewers can see this, all of the squares there, those are profile photos from Twitter. And everybody who is on this map recently said the word Big Data, a topic I think we're all interested in. If you scroll down a little bit, it'll tell you what the first date and the last date of this was. And we'll see that it was from the 26th to the 27th. It's about a day and some of traffic. So this is about 1,500 tweets. And all of the people who tweeted the word Big Data, many of them followed each other. Some of them in fact replied to each other. And what we do is we analyze all those connections. And we do this in a spreadsheet on your desktop. You can go and grab this data and then grind it up. And to your point, do that with a single button, a one button analysis process. You hit the automate button. It just does this. It gives you that. And what you're seeing are the green arcs are I follow you. And if you look very closely, you'll see very rarely there are blue arcs. And the blue arcs are the I actually replied to you. And so we're also seeing that the different profile photos are a variable size. That indicates follower count. And one of the key insights of network theory is that follower count is not necessarily equal importance in a community. Yes, we know. We know. We're horrible, but we're you. So that's what the map is showing you. It also shows me that up in the lower right hand corner, there are all those people who are isolated. They don't have any connections. And that's both a good thing and a bad thing. They need some help. They need to be brought into the community of the communities of their interests, not necessarily forced in to that. So what data are you driving on Twitter fire hose? Or is it just API data? That's the Twitter API. We also import data from Flickr, Facebook, YouTube, email, the World Wide Web, GraphML, CSV, TXT. If you have data and you can get it into Excel. Can we come and hang out at your place in Belmont? Oh, yeah. That's awesome. Well, we're totally into this. We have our own little project called CrowdSpots, which is something that we, because we're in the media business, we're into the crowd behavior. Because if you can understand the crowd, you can do things for them. Help people from the outer range get to find someone to collaborate with. So it's really, really important. So I want to talk to you about the scholarship aspect. So one thing that's really close to my passion thinking is the notion, if someone's out on the long tail distribution of the world, they're not on the A list, so to speak, as I say, in the head of the tail, but just want to connect. People have choices. It's not so much they belong to one community. The crowd is people have multiple realities in virtual space. So they don't know where to go. There's no place to figure out where to find things. So how does your algorithm help people discover either like minds? I'm interested in art today. I'm interested in this and tomorrow. How do you discover people? One of the nice things about the tool is that it will dredge up all the hashtags that are being mentioned within keyword community. So we really start with a query. We say, give me all the people and all their tweets who have mentioned a particular term. That brings us a community. That may be the first thing you're interested in. We then can do what you could call a snowball sample. This is where you start with one term and you move out from there finding related terms. And the highly ranked terms that are associated with the term you like may, in fact, be of interest. The people who are highly ranked in the topics you care about may be interested in other things that you also share the interest of. So it is possible to move through the network and pivot both on topic and on person to discover new ideas, new interesting sources of information. So your tool is putting it into Excel format, which is what the common person, the real world person would use, right? Or a Google spreadsheet or something, but really Excel, not Google spreadsheet. At the moment, Excel. Currently Microsoft Excel, which is okay. That's mature that people use that. So how do you get that to be usable in the sense of I'm a user, I just subscribe to the tool, is it open source? That's right. How do you guys handle the distribution of the code? Right, the code is free and open. The tool is free and open. You can find us on the web. Just use your favorite find quality search engine product to look for the word node N-O-D-E-X-L, node Excel. At nodexl.codeplex.com, you'll find our download site. You download the zip file, you unzip it, you run the setup, and when you restart Excel, you're going to find that there's a whole new ribbon, a whole new menu in the ribbon. And it has all the tools you need to do one of these kinds of maps. It has an import menu, and it says, what kind of data do you want? Do you want Twitter data? Do you want YouTube data, Flickr, Facebook, whatever it is? You configure that, it gets the data for you. You have to press one more button. Okay, so how do you guys engage with people who want to throw money at you, like rich philanthropists? Because you're a non-profit, you need to survive. We are, we are. And you got, how many people now? 14 people. And you got some data centers. We know how much that costs, all the power to run those servers. So how do you guys engage when you guys do fundraising? Do you guys sell the apps? And if people want to distribute your app as part of, say we want to do Silicon Academy and share the data and share the knowledge, how would we do that? Would we have to subscribe to it? Is it a GPL license? Is it? We're like BSD. It's actually the Microsoft Public License, the MPL, but it resembles the BSD, which means that it is okay for commercial derivative work use. That's right. Derivable use, that's it. So it's yours. It's basically, you know, we're very dedicated to the idea that we are contributing to the commons. And we are not drawing a line between commercial and non-commercial use. We are simply giving it away. It's free as in speech, free as in beer. It's free as in Shakespeare. So how would you engage with us? We do training. We do workshops. We have members who do consulting. And of course, we invite support. If you find that this is a tool that is valuable to you, if the research that comes from the use of this tool is valuable to you, we welcome your support. Do you have a staff that's just shaking down trees and stuff like that? I wish we did, no. But we will tweet about it. So smrfoundation.org is a website they can go to, right? That's right. Mark at smrfoundation.org, email Mark, great guy. I'm so glad I found you in the virtual space because this is a really cool and a really important tool. And again, the notion of scholarship really means knowledge and discovery because relationships lead to good things. So my question to you is, as someone who's a scientist out there, social scientist and data scientist, how is big data changing society in the world? Small question. Okay, how is big data changing society in the world? First of all, I would say that there are a lot of patterns that have been difficult to discover previously because the data wasn't there and we're going to find those patterns. And some of them are surprising. And there are going to be opportunities to exploit those patterns. And whoever races to the top of the big data mountain first is going to be able to see the vista that you can see from the top of that mountain. And no doubt, there are some fertile lands out there and there's going to be some valleys that you really want to go out and that's going to be where you're going to, I know. What I think we're going to see as a social scientist, I think what we will see is actually see society. We're going to see the behavior of billions of people interacting trillions of times a day in a way that we have only perceived personally. And so we are going to get a satellite view of human society. It's a great opportunity for the social sciences. It's a challenging opportunity for human societies. It is going to give insight that is so powerful to some people that I think there are all sorts of social issues and conflicts. What are some examples of what people are doing with your tool set? Right. Today, people often use it in research, pointing at social movements, topics, issues, conflicts, looking at the shape of the communities that have formed underneath it. We're doing research, for example, looking at how people talked about the state of the union in Twitter to see that, in fact, two or three subgroups formed and that they had very little connection to each other. And so one finding we're getting from the tool is that social media allows people to talk to each other but that doesn't mean that they do. And in fact, what we're seeing is that, rather than a public space where people engage, what we're seeing are echo chambers. We're seeing the bifurcation of the community and people staying within their community. One of those findings is that the URLs mentioned in one cluster never appear in the other cluster. So you're not seeing peer communities form. You're seeing the communities hang out in isolation. Isolated, gated communities. Interesting. Cyberspace gated communities. So let me ask you a social question. This is more of a theoretical, but yet practically you're measuring, you talk about virtual space. In public face-to-face communication, commitment's interesting. You know when someone's not looking at you or talking with you. But with chat and online synchronous capture of data, you kind of don't know the commitment of the person. And we saw things like Second Life and other societies out there where you actually be moving around in these forums. Is there any implication around this impact to the relationships with this nature of communication where you kind of got this passive presence? So, I mean, this is kind of a tough one to answer. Sociology may have an answer for you. Just up the road, not far from here, the revered sociologist, Mark Grandaveter, wrote a paper in 1972. And the paper was called The Strength of Weak Ties. And so I think what many people have argued about social media is that it helps you cultivate weak ties, but not necessarily strong ties. And the sociologist Barry Wellman at the University of Toronto has actually measured how large your strong tie community is and whether it's changed over time. And he finds it hasn't changed over time. That there's about 10 or 12 people that you know very well that you call on a regular basis. They're your strong relationships. I think the critique of social media is that it grows large volumes of weak ties and presents them as if they were strong ties. And if you think that your Facebook friends are really your friends, then that's an issue. On the other hand, Grandaveter's paper says that there is strength in weak ties. And that's because they are cheap. You can have lots of them. And in aggregate, they have more weight than you would grant any individual. So the person that you only remember because you see them once a year at a conference is a weak tie, but it's also likely that that's the person who has news for you that your friends don't know. So what are the implications of big data and social networking? I think it's going to be that we get to grow these weak ties and get value from them. But it hasn't changed how many strong ties we can actually maintain. What have you learned in your work thus far? First of all, the work is phenomenal. Mark A. Smith is doing some great work, pioneering some of the most cutting edge issues in sociology and also data. What have you found and learned in the process, good, bad, and ugly? Well, and to connect that to your question, what are people doing with it? I think what we're finding is that groups in social media are not all the same. They come in flavors. There are genres. There are types. And you can see on the Graph Gallery site, not Excel Graph Gallery, you can see those Twitter crowds that are clearly a customer service relationship. There's an account in the center and it's got all these tendrils going out. But the tendrils don't connect at the edges. It's a hub and spokes. And that's a shape. And we can actually look at the crowd and go, aha, you're Dell listens or Dell cares or you're Virgin America. You're these people who are engaged in this transaction. Creating noise. I mean, customer service, one hopes. Creating a lifeboat possibly or being a friend for someone, a weak tie. Getting a connection to the resolution path for some company. Versus, let's say you look at the Strata Conference. The Strata Conference has almost no isolates. That's another insight. What we're finding is that brands, public topics, have large numbers of people who are not connected to anybody else. It's actually an indication that you are a brand. A brand means that two people can say the same word and have no probability of connection to each other. And so if you look at something like Strata Conference, almost everybody is connected to somebody else who also says Strata Conference. That means that this conference is an in-group. It's another way to put it in a more positive way. It's a community, but it's not a community. It has gravity. That's right. It's got density, actually. We actually call it density. That's right. It's a star cluster. It's like, you know, the Milky Way. That's right. We're tight. Are people finding ways to engage with these clusters of communities without spamming them? Right, right. And that's one of the techniques that you can use the tool for. You can find out who are these people in the linch pin positions. People call them influencers. I actually prefer to refer to them as simply being in strategic locations. And if you think about strategic locations rather than merely influence, then it could be that the isolates are actually in strategic locations. For new customer acquisition, isolates have actually said your name, but don't know anybody who also says your name. They are rich sources for new customer acquisition. On the other hand, let's just say you're trying to change an entrenched belief. You really need to go after the hubs. You need to go after these people at the center of the clusters. Yeah, so the isolates could be beach-headed, unmeasured, uninstrumented environment. So you could have an isolate out there that looks alone, but really it's just no one's seen that documented yet. So I'm underneath that isolate could be a community. Could be. Or these are people who've just started to mention your product or your brand and you want to bring them into the fold versus the people who are obviously really passionate about your product and brand already and that they're at the center of some of these clusters. And the way in which you engage with those individuals wherever they are is directly, just through, for instance, Twitter. I guess that's true. I mean, how do you do it without spamming? And the answer is you actually form a relationship and a conversation. If this is a person talking about your brand that intensely and you work for the brand, you have something in common. Reach out to them. That's right. Engage commonalities. That's right. We both share an interest in the brand. Let's talk about the brand. You're interested in the ideas and news about the product. I'm going to send you that information. I'm going to form a connection with you. What these tools do is reduce the population of the likely people that you need to engage with from 1500 to 15 because they are the people at the points of gatekeeping. They're at the branching points of the network. So it makes it manageable. So in the old days, in the old days, you have to email people and you help you code a landing page. You run some Google AdWords. Actually, not the old way. That's how people do it today. So in the new way, you have these communities formed. So no one likes someone to walk into their poker game or their clubhouse and start spamming people, right? So if you got, say, a big cluster of people, a million people, how do you engage them? How do you get them to engage you? Do you just, do you have to be active? Is it this notion of presence? Is that what you're saying? How would you advise someone? How would you advise me and say, how would I engage people? Those are all good points. And you don't really just want to show up with your commercial in the middle of the poker game. It's true. But I'll note, sociology tells us something interesting about this notion of engagement and that is the populations tend not to be a million. That they're actually mostly small groups. That if you think about any particular niche topic of interest, if it's beanie babies or playing racquetball or jogging, that the population of people who talk about that is not millions. And that the leading people who talk about that is not even hundreds, it's dozens. It's the loose ties thing. It's aggregation of little pockets within the central gravity density. And there's going to be a few people who are at the center and the right way to engage them is pretty much the way that I think we've learned how to, let's say at a conference, meet somebody new. Which is, you may start by following them. You might retweet them occasionally. You might start saying things that they actually find substantively interesting. At which point a relationship forms quite naturally. Or do what we do, put content in front of them and see if they generate content that is worth it. If they kick the content out or not. That's a good signal. All right, John, listen, we have to run Mark, Smith, fantastic interview. And your Twitter handle is, is it Mark A. Smith or Mark? Mark with a C, underscore Smith. Okay, Mark underscore Smith at Twitter handle at mark underscore Smith. Thanks for coming and joining us. Great to meet you online. And then face to face, virtual connection. Now a new tie, new relationship. We'll be talking a lot about your company and we'll certainly try to make a visit. Thanks for coming on. Appreciate it.