 Hi, I'm Roger. And I'm Claudia. In the mid-1980s, a group of researchers went to a beach in Southern California, stayed there for 31 days and kept track of wind servers. They measured who would interact with whom, how people move between one subgroup of wind servers and the other group of wind servers, friendships, animosity, etc. After collecting that data, they analyzed it as a social network. That way they could get a really good insight in how relationships, social relationships, evolve over time and the structure of society, etc. This is what we'll be doing in our course, social network analysis for data scientists. Well, with the exception of spending a month on the beach, obviously. Our focus will be on how we analyze real-world social network data using the latest statistical models. Our focus will now be on algorithms. Our focus will mainly be on statistical analysis of social network data. Do you see this network behind me? This is a visualization of Spotify data, musicians collaborating with each other. Do you understand what is going on in this visualization behind me? I don't think so. In this class, we can teach you how to do a proper network visualization. But this is not enough anyway. That's why we're also going to teach you how to use the script statistics for networks to understand what's happening there. But not only this, we are also going to teach you how to use network models to explain why these musicians are behaving the way they are. And if you are not interested in people, this is not a big deal. Because when you learn network theory, you can apply it to any kind of relationship, any kind of data. And that's why this is a good opportunity to be creative and to explore what you actually want to explore. Over the course of this semester, you will be working with your group on an actual empirical analysis of network data that you gathered yourself that you will apply the methods and techniques to that you will learn in this course. You will also get an exam at the end of the semester. There's a combination of open questions about the theories, the models, techniques, the concepts, et cetera, of social network analysis. And about half of that exam consists of an analysis where you will get some data from us. You'll get some research questions from us and then you actually have to figure out how to analyze that using the methods and techniques from this course and program that during the exam in R and provide the answers to us. During the class, we will try to be as nerdy fun as possible. We like jokes and we don't like boring stuff as much as we don't. So, for instance, you will get to experience our game, our network game on campus where you will role play to understand how the role and the position that you have in a network affects all the dynamics that happen in the network. And you will be part of our network hackathon in a few weeks. And also you will be taught with a new R package that Roder and I are writing which is innovative and we hope much more fun than the usual learning that you experienced before. So looking forward to see you in class. Now, if you just turn on the TV or you open a newspaper, you must realize that the world that we live in is extremely complicated. And if you just look around yourself, it's in a way a hopelessly complicated world. So we have a society where we have billions of individuals that are all interacting with each other or fighting with each other or trading with each other. So there's this enormous amount of autonomous people that all have their own minds, they have their own interests and have their own, you know, own expertise and their own ideas and their own needs and their own norms that may or may not, you know, fit with other people. So we have a couple of billion people just making up society actively. Other systems that you see are the whole communication infrastructure where we send each other email, like, you know, the sna40s.jeds.nl. We send each other email, we have our cell phones, we send each other app messages. There's a lot of interaction going on. And if you look at the structure of the whole communication in the world, it's unbelievably complex, unbelievably complicated. Just the understanding of how an individual brain works, you need to figure out how all the neurons wire and fire together and how they connect. I was involved in a study where we looked at cancer cells and we looked at how the different components connect to each other. If you can look at different structures that, you know, network structures essentially inside your body that can tell us whether someone has, you know, is likely to develop some kind of cancer later on. Those are structures that are extremely complicated to understand, which is why we still have cancer in the world, even though it's been going on for a long time and you would expect that we would have some kind of cure for that, but we are still pretty far from that. If you look at our metabolic system where we have all, you know, within our cells, just within the single cell, you see all the interactions that are needed just to get our metabolic system working and to generate our energy. It's unbelievably complex, just a single cell. And then if you imagine all the billions of cells that we have in our body, we are made up of extremely complex structures. But even if you take it less as a therapy in a way, I do a lot of analysis in sports where we do network analysis in professional sports. For example, where we analyze soccer matches, and even then you have 22, at any point in time, you have 22 players, plus, you know, the referee and the assistant referees, and they're constantly moving and interacting on the pitch 90 minutes plus and interacting in all kinds of complicated ways. And as a result, you win 1-0 or you lose 1-0. So it's an extremely complicated structure that in the end leads to what do we need to do in order to win this match? We lost it last year. What do we do this year in order to win it? So even just that, even just a simple thing like a simple soccer match is actually a very complicated system if you look at it. Now, the fact that we are surrounded by those kinds of systems is actually something that drives the whole field or at least partly the field of social network analysis. So here what you see is just since, you know, every day when we turn on a television these days, we see all the violence that's going on right now in Afghanistan. Now, this is a drawing that was made of the American counter-resurgency operation. So this particular picture was made in 2009 and this was made in order to understand who interacts with whom on the ground in Afghanistan, you know, in this particular interaction, just this part of whatever was going on in Afghanistan at that time, just the counter-resurgency activity of the Americans. So there you can see that even there, it's just incredibly complicated. And actually a general who saw this picture at the time, he made the comment that, well, you know, by the time that we actually understand this picture, we will have won the war. It's that complicated. They also didn't understand it themselves. And this is just one aspect of the world that we live in, right? So these kinds of systems are collectively called complex systems. And I'm sure that some of you have some background in complex systems before you came to JADS, so you'll recognize your ideas. And you can see in the examples that I just gave that complex systems play a role in our daily life, right? It's not just some kind of abstract notion, it's something that plays a role every day in your life, whatever you do when you go to the grocery store, when you take a course, when you play soccer or whatever. And it's something that has an enormous impact on our life. And the idea is that understanding complex systems and being able to not just understand it, but also predict them and figure out what they look like. And why they are the way they are and how we can change them. So how we can control them. That is generally seen as one of the main intellectual challenges for the 21st century. And I very vividly recall a meeting three or four years ago when JADS was actually still at the other wing of the building that it's now, right now all the companies are. That's actually where JADS was originally. And we had this long meeting with all the professors. And again, everybody agreed that this is the main challenge that we need to tackle in society and in data science in general. So if you look at these complex systems, you will easily see that they are actually networks. So if you look at the metabolic example that it just gave, you know, you have tiny molecules and they are connected by chemical reactions. And that gives a network, right? We have a network of nodes being molecules and you have relationships between the nodes. Those are chemical reactions. If you look at the web, for example, where you have all the different files on the web that are connected by links that point to each other back and forth. And the web itself is an incredibly big and complex network. And that can be studied as such. And we will study some of that as a network. And I'll show you a little bit of how to do that in a future lecture. And then, of course, we have social networks. And then when we talk about social networks in this course, we don't necessarily refer to Twitter or Instagram or Facebook or whatever. But we really talk about humans, you know, like you and I, and our interactions. And so we have, in a social network, we have persons, individuals in that network. And we have all kinds of relationships. We have friendship relationships. We have professional relationships. We have trust relationships, advice relationships, co-worker relationships, co-taking a course, relationships, all kinds of relationships that we have within that network. And as you can immediately see, that can become pretty complex pretty quickly. Now, the interesting thing is that if you look at these different kinds of networks, they are generated by very different things. Like, so the metabolic network that we talked about, that, you know, there's the full evolution of mankind that created that particular structure. So it's not like a totally random structure where your metabolic network is very different from mine that metabolic network. No, actually, these networks are pretty similar now, but, you know, it's pretty different from what our metabolic networks were, you know, two billion years ago. So that actually, that was created by evolution. Whereas if you look at the World Wide Web, that's actually created by how we as individuals interact with each other. And when you put up a new page on the internet and you link to other pages, you're actively changing the World Wide Web yourself. If you look at that, so that's very new. That's very recent. That is, every day we can change that, we can change that structure. If you look at social networks, you know, you can say, well, there's actually the way that we started to, as a society, that we started to interact with each other and develop norms and trusts and, you know, that it makes sense that when I ask you a question that you respond to me on vice versa, those are norms that, you know, we have developed over the years and that are now driving our interactions. So now when you ask me a question in class, you actually expect me to answer, right? That's a norm that you're making up on the spot, but it's actually something that is rooted in our society as well and that took hundreds, if not thousands of years to develop. So you see there's just these three different networks have a very different history, like billions of years ago, thousands of years ago, or just, you know, a decade, the World Wide Web. Well, actually the web started earlier, but most of what the web is right now is due to how we interacted with each other over the last decade. And every day we're still actively changing it and growing it at an incredible speed. So you would expect that those networks actually would be very different from each other, from each other, and would also, you know, they would be governed by very different systems. And one of the interesting things that have been found over the last 20-ish years is that if you look at these different networks, even though they emerge from very different processes, like biological processes or social processes or technological processes, but in the end they actually seem to be governed by very similar organizing principles. And because that is the case, we can actually use a common set of tools to explore these systems. So even though we will be focusing on social networks in this course mainly, a lot of these tools that we're going to be discussing are also applied in biology and are applied in medicine and are applied in computer science or et cetera. So that's one of the cool things about this course is that we're going to be teaching you a set of mathematical statistical tools and a bunch of structures to look at. We're going to be telling you what kinds of structures you might want to look at first and how to add on top of that when you analyze an actual network. And all of that knowledge can be applied after this course, not just to social networks, but also to all kinds of other networks. If you want to study like the network that I referred to a while ago where we looked at cancer, those networks actually build on very similar statistics and very similar structures that we're going to be discussing when we talk about communication networks, for example, or friendship networks or whatever. And you'll use the very same statistical techniques to analyze those. So we will focus on social networks because that one choice actually makes the course much more focused and coherent. At the same time, you should realize that you can actually use that in so many different fields when you have network data. And I think by now, you should realize that network data are actually around us wherever you look. Okay, so when we talk about network science, there's a couple of characteristics of that field. I want to say that it's interdisciplinary. So we will be using examples from different kinds of fields and also the models that we are going to discuss with you have their basis in lots of different kinds of fields. So although we will be giving you a lot of examples from sociology, economics, organization studies, political science, statistics, a lot of what we do is actually also informed by computer science but also, on the other hand, anthropology, very qualitative kind of field. And they all contributed to the development of the network science field and so forth. So which is another reason why, or another sign of that what we're doing is actually fits with a very broad set of networks. Next to it being interdisciplinary, it's also very much empirical. So there are essentially at least two fields that look very similar. So you have network science and you have the field of graph theory. Graph theory is the purely theoretical mathematically but mathematically theoretical view of what networks look like and what they could look like and the graph theorists will look at all kinds of network structures and they will call a network, they will call a graph, but it's essentially the same thing and then they will study all kinds of structures that are there and then look at, well, if the structure is slightly different, what kind of properties that it has. So this is a very much mathematical approach but at the same time it's also very theoretical whereas what we are doing is we look at actual real-life data and then we look at, okay, so how can we analyze that? How can we understand that using empirical tools and techniques and that is core to network science whether you come at it from a social point of view or a more physical point of view, that is one of the traits of network science and the main interest in network science is qualitative. So even though a lot of network analysis is also being done using qualitative methods and that is really cool, really good, excellent research that is qualitative, that's not what we are interested in this field, in this course and certainly it is also not what we are interested in in the field of network science per se. So we are going to be taking very much a quantitative look and qualitative than me, a little bit of mathematics but mainly statistics of what networks look like and how they develop and what can explain their development and what the outcomes are of their structure, et cetera. Now one of the consequences of that is that, so Brom, you want to ask a question now? Yes, about the previous slide. Could you give one example of a quantitative and an example of a qualitative analysis? Yeah, so quantitative is essentially, so one of the things that we are going to look at a little later in this course is, okay, so let me give you a research question and then two different approaches, more quantitative and more qualitative. So I have done a lot of research in the field of social influence where I'm interested in how people influence each other in terms of their behavior, in terms of their norms, in terms of their preferences, et cetera. So if you take a quantitative approach to that and that's what we're going to be doing in this course, we are going to collect network data and then we have statistical models to say, well, there are different reasons why you may have, let's say, a political preference. Just make the example pretty simple for now. Just let's say political preference and I want to model you on a, or I want to model all of us on a continuum from very left wing to very right wing. And of course anyone who has a background in political science knows that it's actually not a continuum, but it's actually a circle. But let's just assume now that there's, you go from very left to very right. And I want to understand how we influence each other in terms of our political position, just as a simple example. Now a quantitative approach would be something that we're going to be discussing later on in the course. That's called a network autocorrelation model where we look at, okay, so everybody's statistical, a political position is probably a function of a bunch of personality or personal traits, like, you know, do you have a job? How old are you? Are you a man or a woman? Are you, you know, how much money do you have? You know, what did your parents vote for, et cetera. All right, so there's maybe where do you live, your socioeconomic status, whatever. But there's a whole bunch of things that, you know, may differentiate you from all the other people. And as a result, you may lean more to the left or to the right. In addition, you also interact with other people. And if you interact with a lot of people who all tell you that, you know, the left wing parties are wonderful and the right wing parties are terrible, or the other way around, you are, you know, in the end, you get influenced by them. So these statistical models actually test to what extent you are influenced and what's kind of, you know, by whom are you influenced and how strong is that? And is it statistically significant? And if we control for other things, right? So you're actually building like a regression model, if you will. But it's a little bit more complicated, but it's, you can think of it that way, where we try to understand what's going on and, you know, how strong is that effect and when does that effect occur, et cetera. But from a statistical point of view, a more qualitative approach would be research where you go to people and you say, you know what, let me interview you. Who do you listen to? How did your view of the world, your political view of the world, how did it develop? And tell me about your youth and tell me about your friendships and tell me about, you know, and then you interview 10 people or maybe 50 people, whatever. And based on that, you come to some kind of an idea of what probably happened. You build that story. Or you work at it from a more anthropological point of view, is where you just watch people and you just sit back and you watch and you try to interpret in your head what's going on and then you build that story. That is qualitative, social network analysis, because the AISI is qualitative. So if I look at it, it looks like people who are very central, they seem to have more influence. And then I observed that there are some people who are more on the periphery that don't seem to interact as much. And when I hear them speak, they don't seem to be influenced as much. So that is a qualitative point of view. And that is also useful. And that can give a lot of insight. But in a network science perspective, we are mainly interested in quantifying and say, well, you know, how strong is that influence? Well, the influence is, you know, 5.6. The coefficient of 5.6 or the coefficient of the order or the order correlation parameter is 0.9, which is extremely high. And this is significant and we can control for these variables. So it will be the same question, but from a very different point of view. Does that help or is it still? No, no, it does definitely. Thanks for the example, Hodja. Thanks for asking. So I really appreciate that. So one of the things that you'll find very quickly is that our field while being exciting is also quite challenging in terms of computational challenges because analyzing networks, just, you know, if you just have 10 people in a network and you discard relationships that people may have with themselves. So if you just have 10 people in a network, there are already 90 possible relationships, right? If you just allow for a single relationship between people, there are 90 possible directed relationships. So if I distinguish a relationship from me to you, from a relationship for you to me, for example, if it's, you know, advice giving, you know, I may give you advice, but you may not give me advice. So that's what's called a directed relationship. Then there are in a network of 10 people, there are already 90 relationships. So if you have 100 people, then you already have 9900. So almost 10,000 relationships so that it blows up quadratically with the size of the network. And then we're only still assuming that the relationship, you know, is either there or not. So, you know, it's on or off. Whereas in reality, it matters whether I give you advice once or every day or, you know, every minute or only once a week or once a month. So you, right? So once you start also adding in the weight of that relationship, and there may be multiple relationships, not just advice giving, but now maybe I give you advice, but you give me money back, you know, then that's a very different relationship if you give advice, you don't give me anything back. So now we also suddenly have two relationships in a network. And then we can have three or four or five, and they may all interact. So you can see that it becomes very large, very big, very quickly. And for that we need specialized algorithms, specialized software. We can no longer use all the standard regression models that you've learned in your career so far. Because all of your observations are now dependent of each other. So all of the main assumptions that you make in terms of, you know, the standard statistical models that you used to no longer are valid. So this becomes a pretty huge computational challenge. And that is something that the field deals with and that you'll also figure out, because even though we'll keep our examples and our data sets pretty small, you know, it may still take you multiple cups of coffee until an analysis has run. In my case, the cups of tea. It may take you a lot of time every time when, so you really have to think about very clearly how to set up your analysis, what to put in, what not to put in until, right, so that it's not going to take you multiple cups of coffee or tea and then to realize that, you know, you should have put in a different thing. You know, you can do that once or twice, but before you know it, your afternoon is gone. And actually one of the things that you, Claudia will probably explain that later, but one of the things that we'll do in our exam in the end is that you're actually also part of, half of the exam is actual analysis of network data. So then if you don't think about it very well, then you could actually waste a lot of your time waiting for algorithms to run that are unnecessary. So computational challenges are really huge and that's something that you'll find and to be aware of. So we'll make heavy use of the R programming environment because that has by far the best and the most specialized algorithms and software. So you don't have to program it yourself, but still you have to build the software what you want it to do. And it's going to take you a while at the moment that you have to wait for it to run. And in our case, if you do real life serious statistical models, the models that I run, I frequently have to wait a couple of days until the model is actually done. Then we figure out that we should have specified it slightly differently. That's data science. Okay, so I think by now you've probably realized that networks are pretty important and it's something that we cannot escape. So we live inside networks, we're also shaped by network because we are inside networks so also people influence us. And you get advice, you receive friendships or people may get angry with you and then your whole day is ruined because someone shouted at you that you are affected also your emotional state but also whether you can get a job or not, whether you get a promotion or not or whether or not you can get a particular advice or not or start a new business or whatever. That's very much shaped also by the network that you're part of. And first of course we also had the power to shape our networks and that's what we're constantly doing in life. So this is something that is not just an interesting topic intellectually but it's also something that you may start to realize that it's something that you're dealing with every day in your life anyway. Any questions so far? I don't see anything pop up in the chat. No questions? Okay, then let me give you a couple of examples quickly because we are running over time a little bit. So this is a network that you're all painfully familiar with, right? This is the COVID-19 virus which in itself is a network. So why are we online today? Well, we are online because we have more than 75 students in our course and so if you look at this particular course, if you look at the whole corona situation that we've been dealing with now for the last year and a half, at least in the Netherlands actively a year and a half and I think worldwide about two years now, is that as you may recall there was this whole way, how do we deal with corona? We don't want the whole society to be affected. So let's just give you an example of how networks explain how a virus spreads. So let's say you have 200 households and each household has an average of 15 contacts. Well, then you get a network essentially like this, just a big blob of relationships where everybody is connected to everybody just within a couple of steps. So this is what we call a network analysis. This is one giant component. Everybody is connected to everybody and actually pretty quickly. So in this kind of network, if you introduce a virus anywhere, it can spread very quickly. So once we started to realize that in March of last year, actually the Dutch government said, well, people have to stay at home. We're going to go into lockdown and only people who have essential jobs are still allowed to go to their work. Everybody else has to go home. So then we suddenly, even though we teach this really cool course, apparently we're not in a central job. So we also had to move everything over online. So what happens actually when you do that? Well, when you go online, so let's see that now we have the same 200 households and we assume that 10% of these households are have, someone in that household has an essential job. So they go to the office and everybody else stays at home. And let's say we have a very strict lockdown where the only people you communicate with are the people that you meet at work. And in this case, this is an example where some people randomly have some interactions and that's it. So now you have these blue dots that have essential jobs. So that's about 10% of the population. And I forget I think they all have an average of four contacts or something that they meet because they have this essential job, like a doctor who meets with patients, do not have the essential job, but they still interact with the doctor because they have to go to the doctor because you're sick or because they have a broken toe or whatever. And then they get in contact with each other. Now if these blue dots have on average, so not everybody's saying, but on average have four relationships, this is one way that the network would look like. So in this case, now there's actually a cluster of about a quarter of all the households. So a quarter of all the households are connected to each other somewhere. So the result is that about three quarters of the network of the households are not so they would be shielded from coronavirus because they're not in touch with anyone else, but about a quarter of the households are at some point interacting with each other either directly or indirectly. So the coronavirus could spread that way. So in this case, if you only have 10% essential households and they have on average four contacts with other people, then only about a quarter of your population might get coronavirus. And three quarters will not, right? Just assuming that this is the only thing going on at the time. Now then there was this discussion, say, well, you know, this gets boring quite quickly. So maybe people should actually be able to get at least someone over as a friend every now and then, even though, you know, they may not have an essential job, but you know, if people who have essential jobs never see anyone else, then that's not a life. So there was this whole discussion about how many people are you allowed to interact with. So now what you look at is a network where every household now besides the whole essential thing is now allowed to interact with just one individual, right? That's the rule. You can interact. Everyone can interact with just one individual. And then of course you have the essential households who still have, you know, who still are able to interact the way they just did. And now what you see with just the introduction of one individual. So here, for example, you see that these two households communicate with each other. These two households communicate with each other. And they are essentially, as long as none of these two households get corona, then they're completely safe. But now what you see is that 71% in this example, 71% of nodes are connected to each other one way or the other. So they're all forming one component. So you can move from one to the other step by step. So now if only one of those households gets corona, actually the virus has a potential, not guaranteed, but it has a potential to spread to about 70% of the population. Just by introducing one individual as someone that you may interact with, whether you have an essential job or not. So you can see that how quickly that spreads. And if you want, and there's something that you can do later on, tomorrow or whenever, if you feel like it, there's actually an interactive application. If you click this, then actually this opens this website where I can actually start playing with it. I'll just very briefly show you that. So let me reload. It was inactive for quite a while. So it's okay. This is an application, an online application that reproduces these results. And then you can start playing with, well, how many essential contacts, how many essential households are there, how many people are you allowed to contact with. So, for example, you can play with, okay, let's say that you have 200 households where 10% of them have an essential job. And let's say that when you have this essential job, you communicate, let's just turn that down. We just have four, let's say that on average, you have three connections. But now you have two connections, two connections on average, everybody is allowed to have those two. Okay. See if my internet is fast enough to, okay, to render it while using Zoom. So now what you see is just by, even though I decreased the number of essential connections that people have from four to three, just by making, they're saying, well, you know, you can have two people come to your house. You can have one. Now, we already had almost 90% of the population that is connected. And that, you know, if COVID is introduced in one of them, it can spread across almost the entire population already. So here you can start playing with these numbers. And this is all built on what is called exponential random graph models. And that is a huge topic in our second half of our course, where you will be able, where you will start modeling, building these kinds of models yourself, not with the virus spread, but this type of models yourself. And you can play with that if you want to outside of this lecture. So, by the way, all the slides that we're using, we all use, we use HTML5 slides. So they will be posted online as HTML5, you actually run them in your browser and you can, they're fully interactive. So you can, so all of this and you can, you can click here and it'll open the, open the application and all of this is fully interactive. So you can, you know, interact with these slides at your own leisure later on. So one really well known social network analysis study is of wind surface on the beach. And this is the very first paper that came out of that actually are multiple papers. And Lynn Freeman is the guy who was, you know, the main person behind his research. And what they did is they spent a month on the beach, observing the interaction of wind surface. And they, you know, had this notebook with them and they were writing down exactly who communicated with whom actually also interviewed a bunch of them. And they collected the network data. And what they were mainly interested in is to see whether people actually knew what the structure of, you know, the community was that they were part of. And that's what they call social intelligence. But actually one of the things they also realized was that there are actually several subgroups with it that's Windsor for community. And they, so they collected that data and they studied it over 31 days of observations. So, you know, it's very painful to have to spend, you know, a month on the beach pretending to do academic research. And that's what they did. And this is actually really interesting research that we're going to come back to this data set later on. But this is a visualization of what this Windsor for network looked like over those 31 days. You can see there's a lot of changes in structure, a lot of variations going on in the structure. So there are days when only a couple of people showed up and then there are these, right? And then there are these days where there were lots who were all interacting with each other like here, but then there were still six that didn't interact with any of the others, et cetera. So they collected that kind of data. And this was a famous study for two reasons. One reason is because this, this, this was, you know, this data set was made available. And it was one of the very first times that people who actually came more from the, like the anthropology kind of perspective, but very quantitative started to bring this kind of data to the world and started to analyze them and also brought it very much to sociology. Another reason why this is a very famous study is because who doesn't want to spend 31 days on the beach doing research. So I guess a lot of, you know, a lot of us are very envious at them for figuring out this kind of research and getting funded for doing this research. And then Freeman, who is the main researcher in this field, is known for always coming to social network conferences wearing shirts with Hawaiian print on it. So he was essentially full time on the beach in his hand. So this is, this is, this is very, very well known, no kind of research, very much focusing now on the interaction of individuals. So let me go a little faster. Let me just show you a couple of networks. Who knows what this network is? Unmute or type it in the chat is also fine with it. And this is a very famous network. Anyone? Maybe you've already seen this picture at some point. Wasn't it one of the first internet networks? Well, not just one of the first. This was the internet in 1970. Yeah, definitely. So there were two spots essentially where there were computers that was on the two coasts. And this was the entire internet 50, 50 years ago. 51 years almost. I go, definitely. Yeah. So here's an example of another network. This is, and you can see it's a little bit more, there's more going on. This is 436 employees at Yulat Packard. And this is a well known, well known research where they looked at how do people interact with each other. And we may come back to this little later on, because even if you look at it right now, you may already see some structure. But as soon as you start using some quantitative measures, you can, you can very clearly see that there's some, some, some very, very, very clear structures going on. And you see a lot in organizations. Again, this, so, but this is who, you know, people emailing to each other. And you would expect that, you know, if you look at the literature on email that, you know, you would expect much more equal communication. If you just look at the technological literature on email, but as soon as you put in the social aspect, then you can, then you actually, it makes a lot of sense that it's very skewed. And we'll, we may get back, come back to this later on. This is an interesting example and also very well known. This is a lot of other mixed research. And she looked at blogs on the internet. And this was prior to the 2004 presidential elections. And she looked at, well, what do people, what are the political color of that particular blog? So is it, is it, is it more left wing or more right wing, if you will? Republicans or Democrats. And what other blog do they refer to, right? So if you have links between blogs, do they, do they refer to other blog blogs that are, have the same color or not? And as you can see very much, people talk mainly to each other, right? So the Republican blogs are mainly read, are mainly referred to by other people who also have their Republican blogs and the Democratic blogs are mainly written and referred to by other Democrats. And that's actually surprisingly little communication between them. And mind you, this is, this is almost 20, 20 years ago. I mean, right now we would say, well, the United States is politically very much segregated. As you can see here, this was 20 years ago, and it was already very much like that. Very clearly separated. So our course is about social network analysis for data scientists. And let me very quickly summarize what that means. So we are going to focus mainly on human behavior. And so there's, there are lots of different kinds of networks, but especially the last kinds of networks that I showed you are networks that are created by humans. So we're going to look at many networks of humans or created by humans. We're going to be focusing on practical analysis. So this is not a course where you're going to learn, you know, the mathematical description so much or the mathematical algorithms for, for analyzing. Now we actually going to be doing analysis. And of course you need to understand the tools. And you need to be able to apply them correctly and substantively meaningfully. But you know, that is, that is the general idea. So you're going to be learning how to analyze networks, not just how to design algorithms for them. And we're going to do that mainly by focusing on statistical models. But again, because it's data science, it is based on meaningful, substantively meaningful questions that matter because we are, we are at jazz. And that's what the course will mainly be about. So if there are no further questions, I would say let's break for Claudia, how long would you like to have a break until you? 10, 10, what do you think? 10 minutes, that sounds fine. So quarter after three we reconvene and then Claudia is going to take over. So see you in a bit. Let's say a few words about myself. So I do social network analysis. That's my main interest. And I also do other methods such as statistical analysis and more in general complexity studies. And we will spend a few words about this later as well. My main interest when it comes to research is studying information or innovations at the same one or the other. Or basically there is a lot of overlap with information and some sort of innovation and always spreads in networks of people, but also within organizations. And this connects very strongly to communication science and to organizational theory. And my main background is in political science and more specifically I have a PhD in computational politics. That means that I worked on computational methods applied to political science problems. And yeah, this is basically the summary of all these things that I did. Okay, enough about me. And we can move on to the main focus of this class, which is as you know already now, social network analysis. So how do we get here? You know, you've got a lot of introduction, a lot of information about what it is, how it works, and a lot of cool examples. Now let's get a little bit on the history of this discipline a little bit because if we wanted to actually discuss the history of this field, it would take us an entire class. And this is not the case. We are going to learn social network analysis and not to cover the history, but at least it's nice to know where it is, where this is coming from. At least a little. So we have two starts for one episode in progress. The story in progress is social network analysis and two starts. Roger already mentioned these two some fields. One is graph theory and the other one is social network analysis. They started independently and at some point they strongly merged into this discipline and we now study and we actually enjoy very much. And so graph theory. So this is a graph, which is exactly has a network, but if you are a mathematician, you want to call it graph. As much as if you are a computer scientist, you probably had some exams in graph theory because it's strongly, it's very important for computer scientists as well. And if you're a physicist, you probably use them as well. However, it depends in which context you learn them. You might call it network, so you might call it graph. But graph theory is the study of how entities, that we call nodal vertex, are connected through edges or links or it depends on your discipline. You will call it differently. How this discipline started with this yellow guy over here that wasn't very pretty, but he was very smart. So Euler is considered the father of topology. What has topology to do with graph theory? Well, this guy got obsessed with this bridge. This bridge was series of bridges. We're in Koninsburg, which was in Germany back then. We are in the 17th, in the 18th century. And now is in Russia in Kaliningrad, I think. Anyway, this Euler wanted to find a way to cross the bridges only once and go around in a pathway. If you want, I could give you one hour, two hours to solve this problem. Well, you won't solve it. Euler didn't. Nobody did it. Apparently, unless somebody prove us wrong, this is absolutely impossible. But Euler got kind of obsessed with this problem and he ended up formalizing it. So each bridge is considered as a node and each of the connections between this knowledge pathway is considered an edge. And this is what is considered the start of graph theory. So sometimes obsessing over something that looks kind of irrelevant is actually very helpful. Don't stop doing it in case you are actually doing it over something. You might become the next Euler. OK. But after this start, 300 years ago, more than that even, graph theory developed as a discipline that touched from topology, mainly in the middle of the last century with these two scholars, Polar Doss and Alfred Rainey, that actually started it off as a subfield of mathematics. And individually, they are also known for the random networks. So if I give you, I don't know, 10 nodes, there are 10 minus one combinations that you can have between those nodes. This is a random network. What you can generate in there is a random network. Polar Doss number is something quite funny. That is an important concept in network analysis. So these are Doss, this is the guy on the left-hand side, was a particular person. And it didn't have a proper home, but he used to have a lot of friends and used to live with a lot of them for one month or two months or something. Since he was an academic, this was a very productive strategy because he had a lot of collaborators and they publish a lot of papers together. And basically you can count the distance that you have from a Doss, what distance means. So if I publish with a Doss and I haven't, I will be distance one from a Doss. If Roger did publish with a Doss and he hasn't either, I will be distance two. So the chain keeps moving and you get your Doss number. And this is something that is very important for measuring the paths inside the network. We will talk about that in the following weeks. The concepts started with their Doss, but now if you want to try it out, there is the Caving Bacon game online, which is quite funny to reflect on these sorts of ideas or network ideas. So you know Caving Bacon is an actor that was very famous in the 80s, but you might know it anyway, that apparently worked with lots of people. So basically if you just go to these websites that calculates the number for you, you can find, you can put a random name of an actor in there and you can find the distance from Caving Bacon. So they use these network concepts to make something quite funny and enjoyable to see. But at the same time, you understand there's more word effects. So that actually is not that unlikely that you are connected in a few steps to people that you have no idea who they are, but that's how research proved this phenomenon to work. Okay, now it's just about the camera. I was reading the chat. Okay, let's move on here. So this graph theory concept got several applications. This is just to give you an example. One of the main application that has in physics is explaining coffee. This sounds quite funny maybe, but if you want to make coffee, you can model this as a square lattice. So this, imagine that all these dots, all these nodes are coffee particles. And when the coffee is not done, so when it's just powdered, these are all disconnected. But as soon as you put the hot water on top, they change their status and they connect. They create what is called a giant component because they become all intertwined or connected and you get coffee. So that's one physical application of graph theory that has been studied extensively. Oracle of Bacon, thank you very much. That's what I'm talking about. Yeah. Oh, fantastic. It works for other celebs. Fantastic. Play around with that. It's really instructive, nerdy, instructive, funny. Okay. What else? Yeah. Thank you. Okay. The other side, the other stream that was born independently is social network, which is born within sociology mostly, not only, but mostly. Anyway, within social sciences. As we said already, a social network is not only Facebook, Twitter, TikTok, but also your name, your favorite social network. That will be an example because they are all social networks, but it's any relationship between people that you can actually study for every, so if it's interesting to study, they will be considered a social network and they will be measured. So you can try to collect data about anything that you find interesting and measure it as a social network, online, offline doesn't matter. So often when you discuss social networks, people think that it's just about the online world, but this is absolutely reductive. The concept is much broader than that and you will find out that basically anything can be seen as a relationship because after you take this class, you will be a little bit biased toward finding relationships between things, but not everything is interesting as a network. It can be modeled, but not necessarily is a good idea. So we will also discuss when it is a good idea to think of something as a network and when it's not. Okay, so when we talk about social sciences that we want to measure, so measuring a social phenomenon, we need to name this guy, which is Auguste Comte. He was a sociologist and he started to think that social phenomenon can be measured as much as physical phenomenon. It can be harder somehow to conceptualize a social phenomenon, but once you have a definition of what you want to measure, you can actually find a way to measure it and then you can move on as much as you do with any other science. And then we have these other scholars that basically compare the societies to biological systems. So probably if now I tell you that you can compare how people live in a society or how bees live in a beehive, it doesn't sound that new to you because probably you heard somebody that did this before. But this guy was one of the first that thought that actually relationships between humans are actually comparable to biological systems that are apparently automatically organized. So we know that bees behave in a certain way because they have a certain instinct that tells them how to behave. Well, humans are not exactly doing that, but they are highly comparable. So natural world can be compared to social world extensively. And the step between modeling this and just having the idea is really, really small. And then we have following this line, one of the very, very, very first studies that use network concepts. So this person, Moreno, over here, not very pretty either, but pretty brilliant, because of psychiatry. So it couldn't care less about networks, to be honest. It just wants you to solve a practical problem. So somebody called him, because in this school in the US, Hudson School, there were girls in there, they were running away. They didn't want to stay in school. So in the first place, they checked their mental health status, but apparently they were okay or there was nothing that seems to be wrong on that perspective. So Moreno runs several analysis. He then, instead, talked to them and discussed the problem extensively and found out that the reason why these girls were pitting the school was not related to humidity or poor heating system or the quality of the food, probably was really low anyway. It was related to the fact that they didn't enjoy to be in the social group in the school. So potentially there was one person that was bullying them. And if they were, for instance, rooming next to this person, next to this girl that was a bully, they didn't want to be there anymore because they were really suffering the situation. And mapping this down in a network managed, I mean, doing that, Moreno managed to find out what was the problem and to prevent this from happening again and to prevent this Hudson School from shutting the doors and you know, go bankrupt because that's the case in the US. You know, the public education is not really a thing. So this is one of the first study and the diagram that Moreno used to map this down is called photogram. So a network in sociology can also be called sociogram and we call it also sociometry. It's a way to measure relationship in a very, very qualitative way to get back to Brown's question before. Okay, and then we have this other guy. This guy was a psychologist, but he was a bit more interested in Moreno and basically he just wanted to see how people reacted in a network. So you are a person and if I put you in a certain network you play a certain role. So you are not a dot in a picture, right? You're a person. And if I want to organize something and you are inside my organization I want to know how you feel, right? So he set it up an experiment in MIT. And so basically these are people, so people person, person, person, person and they tried to communicate in these shapes. So they were sitting down with chairs and everything in those shapes and basically they found out very different patterns of behavior. So these people involved in the experiment experienced very different things if they were in one or the other shape. So essentially this shape the X1 is the most efficient. So if there is a person at the center and you pass notes paper notes to this person the communication will be very efficient and for instance if this person is I don't know the speaker person for the group it will be able in less than 5 minutes to collect all the information and to say okay my group thinks this. However this will be very very very frustrating for the people on the sides because they won't be involved in the activity. They have to pass one piece of paper. The most enjoyable shape for them was this one because they were all full involved but in terms of efficiency was much less okay for the goal. So from a psychological perspective this was really interesting. So the idea is that if you have a group of friends this will be ideal but if you are in a working environment and you need to get something done this will be much better for you. And then we have another relevant study that is a milestone in the history of the social network and is the Zachary Karate Club so this is another study made in a university and there is a Karate Club you probably guessed that already and Zachary started to understand that there were two groups inside the Karate Club they were really not bonding with each other so there were a few people over here they were connected so the two groups are not fully disconnected there were a few connections but the great bulk of the activities were completely separated one here and one there and this might seem very very simple as a consideration but this is actually the beginning of a field called community detection that basically is now huge and there are hundreds of people that work in developing algorithm that are focused on how to detect what is the group how you split the group from the other so this is a whole computational field that you might actually find really really cool at some point. So what's the connection between these two groups? So we have the graph theory side and we have the social network side what put them together like crucially together is the computational revolution between these two computers it was possible to enlarge the studies that you do in sociology because in sociology you collect tons of data in social sciences data is really large so just imagine the data generated online this is all social science data not all of it but the great bulk is social sciences oriented data without a computer you don't know what to do with that you cannot possibly handle it you know how to collect them now is the time for social network analysis and for computational studies and that's how the two disciplines merge into one and that's why we are here with you all together to learn how to deal with all this data and all these cool computers and try to do something fun and cool with them. So as Roger already mentioned the great challenge now is to explain complexity complexity is a whole field of study and again you might know it already but just for those of you who don't know what exactly it means so if you have variable A and variable B and there is an effect of A over B this is a linear effect and this is simple this is called a simple effect if you have several variables that influence a variable A in many different way this is a complicated effect in simple words that could be linear still non-linear but this is something that we can predict it might be difficult to predict it but still we have models our models that can predict what happens to this variable A that we actually are observing however there is a third situation here where basically it's so complex there are so many variables and there are so many effects that are going on that basically our outcome so what we know about the variable that we are trying to monitor that we call variable A it's absolutely impossible to understand and in complexity science we call this emergence so when there is an emergent effect that we cannot absolutely explain with the data that we have that's a complexity scenario and this is not possible to tackle this complexity with the regular methods that have been used 50 years ago they can get up to a certain point that's why we are developing new methods or refining old methods to be able to explain more to address these problems that were not possible to address even 30 years ago there is a range of methods that allows you to do that so for instance we have agent-based models we have system dynamics models and these are mostly these things are very interdisciplinary but if I say system dynamics models this is mostly engineering if I say agent-based models probably is more in the epidemiology side or also somehow computer scientists if I say causal diagrams which is another method it's really in between several disciplines anyway one of these methods is social network analysis and it's pretty huge because you can actually combine them because for instance if you want to use agent-based models or causal diagrams you can mix them with social network analysis and you get to have very very very powerful tools to explain and this is extremely cool because it's new, it's cutting edge and it's fighting so let's take a look in a nutshell what is the difference between computational process in physics and in social science because people think that there is a huge distance between physics, mathematics and biology and social sciences when you use computational process the only real difference between these two things is the topic so am I studying people okay this is called social science am I studying words okay this is biology am I studying name it, I don't know stones that interact because you throw them that's physics so seriously at this point in our age the difference is really teeny tiny it's just about the topic and we still used to talk about natural world versus social world there are lots of comparison and the topic is different but if you observe the way people interact and the way a market works or birds there are a lot of commonalities and the study of these commonalities is part of what complexity science does and network analysis is one of the tools that you use to explain that just to give you one example of how you intertwine these methods you already saw one of the graphs that Roger showed you before there was a causal loop diagram so this is another causal loop diagram this one and this is the network we are here in this study that I'm showing you we are trying to explain why people are obese and try to find a solution for the problem so you might know that obesity is a huge problem all over the world this study was financed in Australia and with the network analysis here we tracked down all the people involved in solving the problems so here we have the local government here we have the health services that provide health for these people but we also have education because for instance if you go to school they only learn french fries every single day they will get obese and things like that are mapped down in this network so all the stakeholders that are in the relationship that are generating the process of obesity so these are comparable but complementary so each of these two is mapping down a different angle of the problem so you can model them at the same time if you connect so for instance here is the school and here is the school too in this level here we can see how it's connected to another stakeholder in this level here is generating a consequence so what happens when they serve french fries every day to another level of the analysis if you study them together using network analysis as one single network that is multi-level and you see the connection between these two your analytical power skyrockets if you compare it to one level only so all these things are completely customizable and allow you to explain very very very cool things okay so with this model mix and match you can do a lot of things as I said let's have a look at some more examples so for instance this is something that you might not be familiar with it's called ego network it's still another thing that you will see in network analysis we are not going to spend a lot of time doing that because this is more of a qualitative approach than still on wrong question and this blueish purpleish dot is a person and is a person that has some sort of heart disease and we want to know how much time it takes for this person to get out of the hospital but obviously we are not interested in one person only we are interested in reducing the time that people stay in an hospital when they have heart disease and they get surgery for that and this is needed because you know that it costs money when you stay at the hospital and so we would like to spend less money on that so what do we do? we monitor the social network and we know that if the person is our target person the closest people to this person that we are interested in are the son the general practitioner the home doctor they are the closest a little bit less close is the pharmacist so probably this person has a personal relationship with the pharmacist let's say that he sees the son in the GP every single day he might see the pharmacist twice per week then on the distance 3 we have nothing and then you can't read very well but this is the sister so on distance 4 we have this sister so this study ended up showing that the people had had closest people here so more people in row 1 here where this means from hospital in a fastest way so basically in order to make them recover faster they needed to have support so that was the main finding so if they had their son or some sort of organization that will help them they will recover faster both from a psychological level and for the simple fact that they needed help in a practical way and was done they couldn't find that out in any other way so that was an interesting finding of the study and if I show you just this picture and I talk about one person this will be called qualitative study because I can use this image to represent what one person does however if I start to have 500 people and I want to do a picture of 500 people and it's kind of difficult to explain the story and make it interesting so there are very few people that will actually like to go through 500 pages checking on this diagram one one and that's when you do a quantitative study when you want to find metrics that can arise something and make it more understandable and suitable for an audience so sometimes the difference between qualitative and quantitative is not that far the conceptualization is the same is just that when you go qualitative you can focus on people when you go quantitative you can actually find information that are relevant on a population that are statistically interesting on the whole population and this is another example of modeling that you can do and giving you just an overview here so this study wanted to understand a debate that happened in the parliament in the UK so they wanted to have a minimum price for alcohol so let's say that you have a bottle of wine it costs 10 they wanted to increase it to minimum 15 this is just a random example I don't know the numbers they discussed it in the parliament and there were a lot of people discussing this and the newspapers reported the view of all the stakeholders involved in the debate so here in these different colors you can see the category of stakeholders so for instance here you have a think tank you can have several or you have charity organization also you can have many so on and so forth and basically the statements that were found on the newspapers were coded by topic and if several of these stakeholders agreed on something they were considered in relationship to each other because they agreed on something and this is the final picture that you find out of it so these stakeholders here belonging to those groups with those colors were the proponents of the people that wanted this to happen and these other were the oponents and this is a very efficient way to summarize what was going on and that's called network analysis and there are other ways to do it but this is a pretty efficient one to deliver this information to a larger audience okay another example so this is a survey that has been done in the US about 5000 people replied in 2008 right before Obama was elected this row over here is the Democratic Party and this row over here is the Republican Party people were asked how much do you like the Democratic from 0 to 10 how much do you like the Republican from 0 to 10 so this is 0 this is 10, Obama won so this is 10 and we see the ticker line the ticker line represents then the higher number of people so when you express two opinions one for the Democratic and one for the Republican these two opinions are related to each other and why are they related because I'm expressing them so I have an opinion about the Democratic and opinion about the Republican and these two opinions are necessarily connected because if I love Obama and I give the Democratic Party a 10 it's very likely that I will give the Republican Party a 0 and they see that in design over here and we also see that there were several undecided people that gave a 5 to 1 and a 5 to the other and we also see that some others voted for Obama but they don't hate the Republican they are 5 years and 10 years so probably those were originally Republican voters that loved Obama so much that they switched just for the occasion but maybe in another election they will go back inverting the preference and this is also a network that shows you that and this is modeling what we call modeling with Netflix so enough about examples you got the idea at this point where to find us well you find Roger and I in class so that's not that difficult but if you want to find the social network community there is an international organization that is called INSNA and this organization organized several events mainly the most interesting are those two for us so the stumble is the flagship conference it's difficult to say flagship conference and it's all over the world so each year is in a different continent so next year it will be in Australia this year it was in the US the last time in Europe was 2018 so it's been in the Netherlands already and it will get back to the Netherlands in a long time probably and when this conference is not in Europe we have the European version of it so every year there is a gathering of social network analysts in Europe normally during coronavirus not really we have an online gathering rather than a natural one as you can imagine but still the idea is the same but anyway if you want to find information about where network analysis hide that's where they hide and there is more obviously and not every person obsessed with networks is obsessed with society some people hate people so they are entitled to do so and they can study any other kind of network that they enjoy studying and they go to other conferences and there is a lot of overlap so next slide it's more of a method conference so if you want to develop a new network algorithm or something you might present it to that site otherwise if you want to use network analysis to explain a complex phenomenon you go to complex networks the difference is mostly that this one is in Europe and this one is all over the world so one year you might find that going to China is not really easy for you and you just switch to this one that is usually in Spain and it's kind of a good holiday as well so that's the difference you can present social network analysis here as much as in the other one it's just that here there will be a larger variety of presentations not in number in topic and the last one that I want to show you is computational social science so when you do computational social science you run models in a computer you run numbers you crunch numbers and in this master you are going to crunch a lot of numbers not just in this class but in many others you are going to do a lot of other computational things this is the place where you can present all these computational works and this is a pretty nice conference as well and it's usually in Europe it depends okay so do you see yourself presenting into one of these conferences in the future it's a nice experience if you want my opinion you might not be interested in this think about it why not so just also think about the fact that you have to develop a project in this class and it might be good enough to get there who knows just depends how much time is spent on it and how much enthusiastic you are on that and so okay enough about networks enough about this this was the introduction let's get to the nitty gritty of this class what happens here so each week you need to read a book chapter normally is one book chapter or two small things is not a lot of reading is really not but as Roger said already please do it because otherwise you will fall behind and also if you do it you're ready for the exam so it's very convenient for you to do it then we have the tutorial so every week almost every week you have a tutorial to do in our package the package that Roger and I are developing that is called SNA for the S surprise surprise and then we have the on play which is something that you need to solve like normally is a little problem or something that you need to solve that takes the topics of the tutorial and gives you the opportunity to use them immediately so you don't forget them and again if you keep up each week your life will be much easier and okay assessment strategy so all of you already know very well that you work in groups and your group project is 40% of your mark so it's a consistent part of your mark then you have an individual assignment in which basically we want to know what you did in your group which was your role in your group and we will tell you about that later but this is a small assignment the other thing is the exam let's get a little bit into all of this so for the project within your group you have a very nice conversation and you decide what you like just pick something that you like guys because if you hate it you will hate to work on it so try to find something that inspires you so it could be political science information, internet studies, social influence marketing or text analysis, recommend assistant crime and safety or anything else you want but if it's something else please talk to us because we can tell you whether it's possible to do a project on that in a few months or whether it is more like a lifetime work so if you have other ideas just have a chat with one of us so after you pick a topic that you like should identify something crime and safety for instance is a huge topic pick something inside of it that is specific and try to understand what you would like to know about that and when you did and you formulated to research questions find data so you have three options or you find data that is being collected or you find a piece to collect the data and you collect yourself or you can do a little bit of both so you can take some data that is already out there and integrate it with some new data so you can play around with that but make sure that you identify and collect appropriate data and when you are there you basically see what can you do with this data and you formulate to research questions you already formulated the research questions you formulate two hypotheses and you will use two of the models that we teach you in this class to answer these research questions and to test these hypotheses and when you are done with that you will write a report all together in the group that is about 4500 words and you will use a template that we will provide you that is written in these R-Packets that is called Papaya and this is super easy okay any questions so far? okay no question, I'll move on so the individual assignment as I said you will explain how you contributed to the project and what you learned overall but we will go into details about that later and don't worry about it and the exam so you know what's an exam right so I'm not going to explain to you what's an exam I can tell you that it's more or less divided into parts one part is questions about the theory that you learned so what you read in the book, make sure you read the book the second part you have one or more problem sets that you have to solve in class so you will have to load the data and to figure out what are we asking you to do and run the analysis for us and we will run so we will receive your script with the code and we will read and re-ground it of ourselves and that's how the exam will work again if you do the tutorials and you do the home play and you read the book you will be fully prepared for that any questions here?