 Welcome to this introduction on historical network analysis. The goal today is not to enter too deeply into graph theory and old metrics that goes with that and to go into the technical aspects of the application of graph theory and network analysis to historical documents, but more to give some kind of overview on what can be done with network analysis regarding to historical sources and archives and documents and photographs and so on. I will start with a short introduction about the main interest in my view of network analysis for history, which is contextualization or hyper contextualization. Network analysis is a way to broaden our view on a specific subject and to help us dig into archives we have not necessarily identified as an important archive between, I don't know, a specific set of individuals and network will give the context around a specific situation. So let's take a very simple example. Let's say we will study two people writing to each other. The graphical convention to display or to represent this kind of relation would be of this kind, which means two vertices connected by an edge. You've probably seen this kind of representation dots, nodes and links between them and that's something that is completely around us in our representations because we see that for metro maps we see that on posters everywhere and this is also why this kind of representation relates a lot about how we as historians consider our own subjects because we always think our subject has relations between peoples and complex situations where things appear in a specific context and so let's start with these couple of vertices and so elaborate that we are interested in the relation between these two people. So this relation can be, I don't know, 10 letters between these two people at a certain point. Of course, traditional history will look at the content of these 10 letters and we'll try to extract content and meaningful information from that. This is of course something we have to do and network analysis will not get rid of this kind of approach but what will be interesting here is to look at, in example here, the archives of these two persons and see that they also have other relationships with other individuals. These central relations appear in this context and moreover we will realize very soon that some individuals can be the bridges between our two main persons of interest here. They could be linked to both of them which will mean something structural very interesting in the end probably. Of course, these neighbors, these neighbors of the two persons that are interesting to us historians could also have relations together, they can be linked together as well and knowing that people that have also written or received letters from our two individuals also exchanges letters between them is something that will probably be also very meaningful to contextualize the relation of our two individuals here. These neighbors can have also neighbors themselves, people that are not connected to the two people we are focused on in blue here and they can also have secondary neighbors and you can expand the graph as you want, at least as the document availability is still here at least if we have these documents. This very simple example to show you that the interpretation you can make about the relation between these two persons is in a specific context and this context should influence the way you are interpreting this relation and not only the content of these let's say 10 letters and the meaning of this 10 letters relation will probably be very very different if it occurs within a specific group, well constructed group or if it is the bridge between two different groups or if it occurs within a group that is disconnected from others etc. This concept of contextualization, the structural analysis gives to the historian is my relation, are these 10 letters connecting these persons in this context or in this one and you can imagine making hyper contextualization with very very large archives and still be interested in these two persons in the end and you will write your article on these two people but you will have this global overview of all the relations and be able to qualify the relation between these two persons not only with the content of the letter but also who it is connected to the rest of the graph. Let's get to a few basics of graph theory and especially because it's probably what will interest history in the most is the history of the development of this discipline in itself and we'll take as a first step in this history the seven bridges of Königsberg problem by Leonard Euler who developed at the 18th century a mathematical demonstration that lays the basis of graph theory. Imagine the city of Königsberg which is Kaliningrad today a city built on a river and with this specific island in the center and divided in different areas. The question people are supposed to have in Königsberg is is it possible to find a path that goes through all the bridges without coming twice through the same bridge and is it possible to visit all the areas of the city on the Sunday walk so what is interesting is that the problem is reformulated in abstract terms but by Euler in his paper each area of the city is a vertex and each bridge is an edge connecting them other geographical characters you cannot relevant to test the different path in fact you can simply simplify the problem as a mathematical object and Euler observed that except at the end points of the walk of course whenever one enters a vertex by a bridge one leaves the vertex by a bridge so that means that the number of bridges touching every area must be even if you want to be able to go through this area so here all the vertices have an odd number of bridges connecting them so of course two areas can have an odd number of bridges if they are the starting and ending point of the walk but not all of them so here there is no path there is no possible path in this situation so this demonstration is very interesting because it consists in counting connections you see I've written the number of connections here 5333 which is exactly the foundation of graph theory which consists precisely in producing this kind of metrics let's take another example just to have an idea of the applications and the first applications of this kind of theories to realize situations here we have this Moreno sociogram from 1934 so Jacob Moreno and his assistant mapped the relation within classrooms of young children imagine this early semester exercise where we ask all children to name the two people they want to be seated next to and then map these relations on a piece of paper and try to understand do the girls want to be seated next to girls next to boys and same and what is extremely interesting in Moreno's sociograms is the fact that he's already creating new metrics to analyze the situation of these boys and girls in these classrooms he tries to understand where people want to be seated next to each other he tries to understand who are the people that are the stars of this network the very famous or popular boys and girls of these classrooms and also the isolated individuals the people that will probably not find any match in the classroom and what is interesting with Moreno's sociograms is that he designed them by hand and he tries to show the different relationships between boys and girls and how these relationships evolve with age so he shows for a very lower classes that boys are okay to sit next to girls and after that the difference between boys and girls became much more important and divided the graph and he presented his sociogram as the one on the left here where he put the boys as triangle on the side and the girls as circles on the other side and then he presented the relation between them and these representations show very well that there are two relations only two relations between boys and girls in this classroom here these are the two edges that connects the two sides of this sociogram and what is extremely interesting is that if you take exactly the same dataset and you map it with a forest-directed algorithm which is a class of algorithm that is mostly used today for graph visualization you will see that the clusters that are formed are not the same Moreno thoughts the cluster where or the cluster found by Moreno in 34 especially you will see that the groups of the girls is separated here into clusters and not only one which means this little blue group in the center here is not more connected to the other girls than to the boys group and that is an insight we have only with the visualization here it's not something that can be extracted from the data except that you will probably find a community here but you will not be able to say if this community is more connected to the left group or to the right group so that means that the way we visualize networks will also condition our interpretation and will show us things that were not shown if we were visualizing it another way which is why of course graph theory is a lot about metrics and calculation but also visualization if you take this example of all the possible readings you can make of a network you have of course this visual analysis looking at the overall distribution organization of the nodes seeing if you see clusters, spares area, cliques and components that are very strongly connected so that's the visual reading of a graph but of course you will also have local metrics that calculate information about a specific node in the graph and you will probably need to calculate them for all the nodes of the graph to be able to extract this value for each node which could be the number of neighbors we've seen the neighbors concept before which is the degree centrality will have the betweenness centrality which measures how much a node is on the path between the other nodes and so and so so many different local metrics that will help you highlight specific elements of your network to dig a bit more on these kind of individuals if these are individuals like Leather Network also and you also have these global metrics the global density of the graph the global clustering coefficients that will help you compare a graph or compare a different graph compare the same graph revolving over time or so so I will not take any longer on these kind of metrics because here the purpose of this presentation is to discuss how historians use network analysis and visualization and not to dig too much into these specific details but of course the question of how do we translate what these metrics are giving us on a very large spreadsheet how do we translate that to historical interpretation is something that is subject to debate of course how do we do that now let's spend some more time on how do historians use their archives to produce network and especially are there different ways of using this kind of archival data to produce perhaps different types of networks I propose a typology that distinguished three different types of networks that's not something that has to be forever that's more a tool that I offer to try to understand what we are doing when we are analyzing historical networks the first type is the reconstituted network which is a network where you will try to summarize the situation you grab data on different archival documents you will take things from here from here and you will try to make an overview of your subject that is more or less like drawing something in your notebook based on all the archives you've gathered and you will try to make a nice picture about that and a picture that summarize and helps you understand your subject better and helps your reader of course understand your subject better a very nice example about this kind of process is this paper about the main families families in Firenze at the 15th century where the authors tried to recompose the network of these families but of course they know probably much more about the Medici than some other less known families anyway they will try to map all possible relations between them and they have dozens of relations different types of relations are family married together do they have economical relations do they have a political relation are they friends and sharing goods sharing money sharing places and so on and so on so a network where you put everything you find about a specific subject to try to summarize the subject but of course you know that this will never be completely a sound because they will probably be missing data somewhere you will probably have more on this part of the subject than this one and so on which is not really a problem in itself because the only goal here is to summarize and reconstitute everything you have and possibly everything you can have so anyway sometimes you are forced to do that the second type is the extracted or what I call the extracted network which is a network where you have a source that is listing elements especially here you can have all the authors of specific journals and you will list all the authors that published together or that have been published in the same issues and are at the same time in a specific journal or in a few journals or you can have an affiliation graph of charities in the US here where everybody is connected to a specific organization or more than one organization at once and then you will project this graph to try to have this person-to-person relations to see if people were sitting in the same committee at the same time so basically these networks consist in extracting information from a source that is in general a list or something that looks like a list or you will make a list from your archival documents and then you will map them as a network and this is probably the type that is the most used in the historical sciences because it's very convenient and it relates a lot with who our archives are organized in general and then the third type of analysis is the metadata network where you will not be interested in the content anymore but you will be interested in the circulation of this document so you will map letters written from a place to another you will look at these circulations and all these big projects about the Republic of Letters these three years are really projects that made this kind of metadata network analysis popular within the historical sciences there is another way of using networks of course which is not related to the kind of source we use to produce specific networks but it's related to the way we will be using this output for the research which means that you will have all three types of networks in this one which is the network as interface which means now the network is not anymore a way to produce metrics to analyze a specific situation or to produce a visualization to understand the globality and the structure of the situation but the network will be interface the way you will use to look at the data and to dig into the data to come back to the data and of course it's very powerful to have this kind of graph representation of your database which is basically if you have a network database a graph based database and the network will help you go back to the data and click on vertices and click on edges and understand better what your data is about and to come back to the archives now I propose a specific case study which is not only a way to illustrate what has been said before but that is also a way to go a bit further into data modeling for historical networks and especially when you are dealing with very complex situations that are expressed at different levels that needs a temporal dimension and so this short example is a way to address these questions. We will speak here about the International Committee on Intellectual Cooperation which is a specific committee of the League of Nations between the 1920s and 30s so League of Nations which is the ancestor of the current United Nations built this committee and gathered scientists from all over the world to try to make people talk together again after the First World War and especially to show to the scientific and academic world that cooperation was possible and to try to increase the amount of collaboration within all scientific fields so you see on this picture very famous people like you will probably have recognized Albert Einstein at the end of the table or Enric Lorenz many scientists that are at the 8th of their career and they are giving time for intellectual cooperation and what will be interesting here is not necessarily what these people are preparing together around this table but how they shape or reshape scientific exchanges and how they get information from all over the world and spread information and try to coordinate efforts and these documents can be official documents, official minutes of the meetings, they can be letters telegrams, they can be internet notes and all these documents are telling us something about the relation between these persons not necessarily only persons around the table persons around the world and here like of course a note can be written from the secretary to another, a letter can be written by the president to all the members or a telegram can be written by someone outside the League of Nations to this committee to have information and so on so all these documents are tokens of collaboration or tokens of relations and these documents can be represented as a network as a big graph in this example here we work only in a few years to keep that analyzable and here this network this complex network is exactly what we call an hellball which is a network where it is impossible to see the structure globally just by looking at it, you will need to find ways to try to to play with visualization and to play with the metrics to extract information from this kind of object so you have here more than 3000 people connected together around the committee on intellectual cooperation this committee which are the dark blue nodes in this representation what can we do to try to read this network and understand the evolution of the structure of course we can divide the network into time slices like years here and try to make sense from this representation to understand the evolution of the relations so that works very well to understand the evolution of the quantity but to analyze the evolution of the structure we'll need probably to go even further than that because even a specific year here is a bit complicated to be read by DI and of course we'll work with metrics, we'll work here with some global metrics the evolution of density, the evolution of clustering but also local metrics to see the evolution of the position of a certain number of people in the structure here to analyze if they are more or less connected at a certain moment or at another of course another way to analyze these kind of big graphs is to focus on a specific ego network or specific individuals within the networks and here we can select two persons and color there are neighbors in blue for one and yellow for the other and green for the people that are connected to both of these persons and this approach will help us to understand the specific networking practices of certain individuals because here if these individuals are coming from outside the League of Nations what will be extremely interesting here is to see that they are trying to get into this big network and they will write the letters of specific people to try to get involved and to get financed by the institution especially here and we see that these two persons that are working closely together they are connected to very different parts of the graph and that tells us a lot about their own social behavior within this field what we can do as well is to try to find other data sets to complement the one we have so we can work with them together to produce a multi-layer system because what we do have here is quite a flat representation of the relations between individuals and all the letters and documents where they are mentioned or that are connecting them together and this is just one layer of every relations we can imagine about intellectual cooperation within the League of Nations and so to this metadata network we will add another one which is an extracted network another type of network, the network of all the individuals within this organization and how they are affiliated to different bodies or entities of this big very large bureaucratic organization so we will make the link between the individuals and the committees or secretaries or boards and so on and of course at certain point in time people are affiliated to many different organizational structures at the upper level and if we want to play a bit more with new data sets to try to understand this hierarchy and this verticality in the network we can add this network of all the entities of the League of Nations and their relation together especially their hierarchy within the League of Nations and we will work with what is here an organization chart of the League of Nations which is as you see quite a big organization and this information will give us insight about how these committees are connected together and how they are connected to top level organizations which could be the main secretariat of the League of Nations or which could be other organizations that are connected to the League of Nations so that means that here we are developing a model that is developing itself at different level and this level are different graph layers which means that we can connect these layers together and get some kind of multi-layer network analysis and visualization so this is what could be done with this kind of data 3D representation of these different networks connected together where you have top organizations organization entity and individuals so individuals at the bottoms are linked together because they are sharing documents and writing letters connected together because they share officially relations they are supposed to be connected to each other within the hierarchy of the organization and they are connected to a top level organization the League of Nations, the states the different large entities that are working on multilateralism and intellectual cooperation and of course you have these vertical links that could be affiliation links which would be the case here between the organizations and the top organizations so that means they are affiliated to but the individuals they can be affiliated to very different part of the upper level network of course so this is a three-dimensional representation but of course in many cases we want to simplify that to be able to analyze it better because of course 3D is not necessarily easy to read so that means we'll have to use a few visual and statistical tricks to try to flatten the hierarchy and the affiliations at the lower level here and here we have one possible modeling for these kind of situations where you will keep as a basis the metadata network all the letter exchange between these 3,000 people but you will group them according to the upper layer and you will group all the people from the International Committee on Intellectual Cooperation somewhere and it's a curatorial here and it's subcomittees here and all the people from this part and this part and this part of the organization there will be maps on a two-dimensional representation here which I call an organizational topography and one possible process here would be to come back to the metrics of graph theory and to apply them to this graph and to see if they correlate with the organization of the graph now because now the organization or the special organization means something because it's this kind of institutional topography and we'll see very well that the betweenness centrality here is very related to people member of a specific group of the network and so on and of course we'll play and we'll continue to play with our model and to divide this time the graph according to the temporality and to see these relation evolve within this topography or within this grouping and to see that at certain point in time this subcommittee is very important and two years later it is this institute that is important and so so being able to multiply to find a new axis of analysis with our model and we can of course expand our model on another axis, another direction which could be different facets related to different type of relations like relations that are documents speaking about university relations or documents concerning exchange of material or exchange of students or exchange of professors and so on and so on and we'll be able to cut the graph into very very little pieces where things will be more readable and of course where once again we'll be able to run all these metrics to see that specifically on a specific time this committee was more important than this institute if we only take the document concerning university relations for example and then to be able to come back to the specific archival documents that are creating this relation and know to be able to as an historian say that this set of documents here are precisely what makes the relations between Geneva and Paris at this certain moment around this certain group of people and so on so to conclude now let's remember that when we produce a network from our archives we are not doing the Facebook of the past we are mapping our archives the content of the archives the information we can extract from these archives the circulation of these archives that's not the subject in itself that's only an artifact that will be useful for us to get back to the archives network analysis is a tool it's one tool among all the other tools you have in your historian toolbox like every other tool it has to be criticized discussed and of course being able to express your subject as a network does not means that you will find the data to analyze it as a network