 Thank you. So basically, most of the work has already been done, so that's nice for me. That's cool. But I will still introduce you for misunderstanding that we often have when we think about big data. So these are the four misunderstandings. Four plus one is actually four, because we don't have the time for the plus one. Let's keep that one. The first one is that when we are talking about big data, in fact, the first wrong thing is that most of the time we are not actually talking about data. We are talking about something else. We are talking about traces. Basically what we do, and it has been said already, is that we use the little traces that basically us and the object that we use live in digital media to extract information or sense from them. A little bit like the little birds following and so on and eating little breadcrumbs, like exploiting a little breadcrumbs, as a greater left for themself to find the way back home. That distinction between traces and data is actually very important, because data and traces have a completely different nature. And the first important thing that we tend to forget when we don't make this distinction is that the data that we are talking about are not produced for academia, not for the social sciences, not for the national sciences. For the most, they are produced for various reasons, relative to surveillance or technical efficiency or economic exchange, et cetera, et cetera. So when we say, and it's often said, big data is making, or digital technology are making data less expensive for the academia, this is not actually true. What they are doing is that, in fact, they are shifting the cost of collecting and producing those traces and those data to someone else. The people or the companies that manage and handles these digital networks. And that is important. This, for example, it shows you the capital industry investment, who actually paid for a good share of data revolution, and is actually an interesting controversy about this image. Because if you, which also shows you you can easily sort of light with data and the way you present them to connect with the last presentation, it's actually very important. This is the community of unadjusted data. But if you represent them the right way, which means you de-inflate the investment and you show the investment here per year and not the community, you see that it's not actually growing through time, it's actually declining a little bit. Just to give you an example of how the way you present data can make an important difference. But it's more than that. When we work with digital traces, we have to always keep in mind that they have been created by someone else with objective that might be different from ours. And so the condition of creation of these traces are very important. I'll give you an example. A few years ago we were working at the Media Lab, we were working in Paris, we Google what is called Google Trends or Google Insights for Search, which is a very interesting set of data that Google generously give out for free. But in a strange way. About the researches, the queries that are made to research engine. Right, so you can take whatever what you want to Google Trends, like I don't know, soccer, and they will tell you how many people have searched for that word in the last days, in the last weeks, in the last years. And you can break it down to different parts of the world. So we were working with an economist, as you probably know it has been used to build a pretty controversial but still pretty good indicator of flu spread around the world because people look for their symptoms. So it can be used in real time to detect the spreading of influenza pandemic in the world. So we were thinking about trying to do the same thing for economic indicators. So matching search queries in Google with economic indicators. And while we were doing some literature analysis, we found this very interesting discussion paper by Kitter and Zimmerman, where they show that there's actually a very, very good correlation between the unemployment rate, which is the black line here in the US, and the search for antidepressors side effects, side effects, side effects of medicine side effects. So there's a lot of people that you should take, how many people typing, I don't know, Zoloft side effect, for example, and you match it with unemployment rate, it sort of correlates very well. And so they make the hypothesis, which kind of makes sense that this of course is the start of the crisis, the economic crisis in 2008. So they made the hypothesis that this could be used, the search for antidepressor side effects could be used as an indicator of unemployment because people lose their jobs under the press, so they take under the press so they look for the side effects. But then we try to reproduce those results, and it works, but we also find out that there were a lot of other words that sort of add the same curve, for example, templates, this is actually Google tries. For example templates, it also spikes up starting from the end of 2008. So we thought, what, and recipe as well, a lot of these sort of side words, and in fact then what we discovered is that what happened is that in August 25, 2008, Google introduced its suggest feature. So now when you type Zoloft, Google proposes you side effect, when you type Apple Pie, it proposes you recipe, right? And so all these sort of side words spike up starting at more or less at the same time of the economic crisis. But so the correlation is very, but if you forget that those querying are made through Google, and so the condition of the production of those data depends on the technical infrastructure that collect those traces. You can easily miss this little, but important distinction. And this is another example we are working on, but then we go very fast, but we are working on a project called Contropedia where we use data on Wikipedia, Wikipedia edits to detect a controversial topic. So basically when we see there are pages that are very much reverted by different people, we take that as an indicator of controversy or debate. But we are very, very aware that all the time there's a problem and what are we mapping here? What are we seeing? Are we seeing the media or are we seeing the content? Are we actually seeing some societal debate or are we seeing something about Wikipedia? Are we learning something about society or are we learning something about Wikipedia? And often we are learning more about Wikipedia and about society. There's a lot of discussion that's going on about Wikipedia that are very, very specific to the bureaucratic format of this encyclopedia, but it does not really relate to what happened in society. So here's a few articles that I've wrote on this because you're interested. So the second misunderstanding is probably more important is that quantity is less interesting than varieties, not that variety. Do you know what happened to Google in September, 25 September 2015? Can you spot the difference? This is September, this is 24 September 2005 and this is September, two days later. Something is different, apart from the cake, which is the birthday of what Google is up, I guess. Sorry? No, that's not that. 2005, yeah? It's September 2005. It's September 2005? Yeah, it becomes the same. Okay. Oh yeah, yeah, yeah, that's the Google. I don't think it doesn't mean that they have been funded for seven years, so that's better. But apart from that, there's another thing, and it's this one. They remove the number of pages in which they were searching. That had been, I don't know if you remember, it was, you know, now everyone has forgotten about it, but for a long time there was this big competition between Google and Yahoo on about who were, which search engine were indexing more pages. Like in every week that we publish, we are indexing five trillion pages, we are indexing five, blah, blah, blah. And on September 2005, sorry, no, 2015, Google said, in fact we don't care, right? What is important is not how many pages we're indexing. Because we're not indexing, we're only indexing a small fraction of the web anyway. And that's good, because you know, our users are not interested in the old web anyway. So what is important is that the first five results are relevant, which remind us that we have to take the metaphor of data mining seriously. When we say data mining, and we refer to mining, if you think of what mining is, it's like throwing away 99.9% of dirt to keep the only 0.1% of gold or whatever. That's interesting. And when we're talking about digital data or digital traces to be more precise, they look more like, it's often said information is new oil, exactly an misunderstanding. Because it gives you the idea that you just drill somewhere and there's data flowing up easily and sort of almost for free. This is not the case. Digital data resemble much more to unconventional data, like oil sand and all this. They're dirty. They require huge infrastructure. They're complicated. They have to be clean and they have to be extraction. Without this work of extraction, there's actually no very little value in digital traces. An example is this map of the web, which is to my knowledge, the most exhaustive map of the web that exists by now. It contains a few millions website of this point here. The circle is a website. And it's also useless because there are only two things that you can, it's very nice to look at. But the only two things that you can learn from this map which are not very surprising. The first is that website that share the same language tend to cite each other. That's why you have the yellow, which is like China, cluster here and you have the blue, light blue cluster here, which is English speaking website. Then this, I think, is Brazilian, Portuguese, this is French, et cetera, et cetera. But that has been known for a long time. And the other thing is that it's the power love of the web, the fact that there are some websites that receive very few websites receive most of the attention. And there you see there are very huge websites size is the number of link that each website receives. And there are a few ones that are very, very big and a lot of that are very, very tiny. But that has been known for a lot of time. So if you want a good map of the web, that is a good map of the one. This is produced for Le Monde, the French newspaper, by a company called Linklurens. And it's only, it's specifically, it's not a web. So it doesn't even try to map the web. It concentrates focus on a very specific part of the web, which is the French political blogosphere. So it's a tiny map, it's only 1,400 websites, almost 5,400 websites. And there are only political blogs. But since they have done the job to manually select those few websites, then this map can show us interesting things. And to give you an example of the sort of findings that we can see in this map is that the extreme right and extreme left have a very different profile in the French blogosphere. The extreme right is spiked, it's very, there are a lot of blogs, but it's also marginal, see, it's here. They tend to be very connected among themselves, but it's connected from the rest of the blogosphere. The extreme left, on the contrary, is composed by fewer websites, but they are spiked all around and they play a very important role, which is some way unexpected, instead of bridging the discussion between left and right. So when you think about which type of data you're looking for, sometimes it's not, the question of how much a big the data set is, is less important than how interesting it is, how diverse it is, how it opens up observation that could have not been done anywhere else. So for example, to talk about climate change and a project that we have done in the Media Lab, we went to map the climate negotiation and we hesitated a lot if we were going to go for Twitter or for the Earth Negotiation Bulletin and eventually we decided the Earth Negotiation Bulletin because it's much more specific. Like, Twitter is much, of course, it contains much, much, much more data, but the data is less specific, so possibly less interesting. And so this is what we end up producing and that is what is the UN is producing on Twitter, which is also very interesting, of course. I will come back to that image in a minute. So the third misunderstanding is that digital is not automatic. Doing digital research is not just pushing back, right? Sometimes we hope in the academia, at least in the social sciences, so when people come to the Media Lab and ask us how we can help them, most of the time they'd expect that we would have sort of solution that would make their life easier. It is not the case, right? Digital traces would make your life even more complicated because precisely this work of extraction that is necessary to produce the map that I showed you before, this one, this little image and this little page, we had to go through this super complex protocol which I will explain very quickly. So we extracted, we took all the PDF of the Earth Negotiation Bulletin, then we selected the part that only the one that were about cops, conference of party of UNFCCC, then we cleaned out only the section that were about cops because sometimes it's mixed up and you get also other things. Then we identified terms, for example, names of countries or negotiation groups or noun phrases that refers to notion mobilizing the negotiation. For example, climate ray disaster, financial resources, et cetera, et cetera. Then we extracted those terms from the bulletin, from part of the bulletin that we kept, trying to find all the different sort of variants, linguistic variants for that. Then we went back to the bulletin, we read all of them and we claimed up the term that we extracted so we sort of look at which term actually meant something and which other were two ambiguous for bringing sense. Then we merged the term that meant the same thing, for example, technology transfer can be said in many, many different ways. So we merged all of them and then we built this other map that shows each little point here is one of the terms that we kept after the cleaning and they're connected if they appear in the same paragraph of the UNB. And so we could detect this sort of thematic distribution of these terms in the climate negotiation. And so we sort of detect these different clusters of noun phrases, one about the modus and the TC, one about greenhouse gas emission or body counter protocol, et cetera, et cetera. And then we use, so the fact that these words appear often together is not decided by us. We found it in the Earth Negotiation Bulletin. So then we could use each of these clusters of words as a sort of dictionary to identify a specific topic in the negotiation. And then what we did was a network sort of time analysis of the appearance, the occurrence of each of these themes in the negotiation so that we can then trace how much in each cup, each of the theme or what's discussed. And this is the map that I showed you before just to give you an example. This is the pinkish stream here is the post-keoto agreement. And of course, it becomes irrelevant when it's starting from 2005 to 2006. And then it gets very important. This one is climate change, the light blue year. This is a climate change impact. They're very visible at the end. Not discussed at all from the beginning, et cetera, et cetera. But that was not enough. That was not finished. From that, we also have to build a narration. So we have to tell a story about this because otherwise it's visually by itself. Doesn't really say a lot. And so then we write this whole thing that you can find online on the here, cloudmaps.eu, which is a platform that we produce at the end of this project. And that contains this story about climate negotiation and five others. And this is a couple of paper that you can read about it. And I'm almost done. But the last misunderstanding, which is possibly more relevant for those of you that are in social sciences, which is definitely more relevant for them, is that more quantification demands more qualification. That's rooted in the methods that we use in social sciences, this distinction between qualitative methods and quantitative methods that you probably have heard of. Qualitative methods, anthropology, direct observation, in-depth questions, that allows you to collect intensive data, meaning rich data, but about very little population. Quantitative data, statistics, pools, allows you to collect extensive data, meaning poor data, but on large population. And for a long time, there has been this, when you enter the social sciences, even when I enter the social sciences, when I enter my PhD, I sort of had to choose my supervisor were, you wanna do qualitative or quantitative? And the interesting thing about digital methods that allows you to completely overcome this distinction. I have an example here. Would I go there if I click? Yeah, nice. Okay, oh, but it doesn't show up. Maybe I have to move it, yeah. Yeah, okay, so this is a project that we didn't meet a lot, but unfortunately it's in French but I will drive you through very quickly. So it's about, it's not about how much change our environmental data, it's about law. It's called la fabrique de la loi, the fabric of law, the industry of law, and it tries to answer the question that you see there, does members of the parliament do the law or not? Because you probably know in political science there's a big discussion about whether laws are actually made in the parliaments or they're just passed through the parliaments but in fact imposed by governments. And so we look at this in the French parliaments and we built this database that contains more than, that contains about 900, sorry, 290 low text. You can see them here. You can filter them or see all of them. This is only 2013. Oh, we can see all of them. Okay, so here are the 300 laws discussed in the last four years in the French parliaments and you can see them in different ways. For example, this book, interesting news, the quantitative way. Here, where you would see how much time each law spent in each part of the parliaments. So the Assembly, the National, the Senate and the different commission. So that's a very aggregate view. So that's a very quantitative view of these laws and you can compare very easily in one screen of your computer, 300 laws. But then we can start drilling down and we can click on this one. For example, this one is the homosexual wedding. Big controversy in France. And here we have again, very aggregated information on these laws. We know that there have been 9,000 amendments that have been submitted. That only 0.47% have been accepted. That the size of the text has decreased, which is strange, of 37% and other information. But we can click on it, explore the articles. And now we will see all the articles of the law at different stages of the discussion in the different parts, branches of the parliament. And when you move your mouse on each of them, they will tell you how much it has been changed. So this one has been added at this part. But if you go to this one, for example, article one B, it has been modified by 80% at this stage. And if we click on it, here I will see the modification. So I have the versioning of the article, say, which part has been added, which part has been removed. A little bit as you would have in Wikipedia. And you have that for every article of every law at every stages of the discussion. If I click on another one, it will say here. But we can do better. We can click on explore the amendments. And here we will see all the amendments that have been submitted. Well, this is on this specific article. Let me click on here so we see more. There are all the amendments that have been submitted to the law, but I can do article by article, at this specific stage. And I can see by which party they've been submitted and which are their final destiny. They'll be accepted or rejected. Most of them are rejected, of course. And if I click on each of these things, for example, this one, I can actually read the amendments and the people who sign it and when it has been submitted, et cetera, et cetera. But I can do better. Again, I can click on this little thing here. And now I can see data on the discussion about these amendments. So I can see how many words has been spoken by different parliamentary groups at each stage on each article, on each amendment, on each article, at each stage of the discussion. So if I click on this one, I see the UMP, for example, has spoken those many words on this amendment, this stage, blah, blah, blah. And I see the people. So I can see who actually spoke those words. And I can do better. I can click on this intervention, read the intervention, and I can actually read the verb team of what they have said. So the point is, I'll come back to the presentation. The point is, in a few clicks, like four clicks, five clicks, I can go from a super aggregated view on this data to the verb team of the word spoken in the parliament. And it's this possibility to navigate through that escape that is sort of dissolving the distinction between qualitative and quantitative data. In a sense, it has always been possible, right, to desegregate the data, I come back to the original data, but it took a lot of time. It took to go back to the archive, to find out the original collection of the blah, blah, blah. Now you can do that quickly enough, and that changes everything. Okay, so this thing you can read. So why does it, what is that, and then I will confirm this one. So why is this important, this sort of overcoming this qualitative, quantitative distinction? The reason why it is important is that this qualitative, quantitative distinction, as for a long time in the social sciences, be connected to another distinction that is about what's called the micro level and the macro level. So the micro level would be the level of individual interaction, or individual choices of consumption, for example. And the macro level would be the level of social structures, like big institution, like the markets, the market economy. And sociologists have, for a long time, and see most of them, thinks that these two things exist on two separate levels, that these are two different level of existence, right? And what we are trying to push in the media lab is the idea that it might not be the case, that it might be only a sort of an artifact of this divided vision of this strabismus that we have so far. And that is particularly important, I think, in the case of environmental debate, because when we talk about, for example, climate change, we see it very quick, very well, working in this EMAPS project. You could either, probably some of the most important discussion do not take place neither at the level of the individual citizens, nor at the level of the big global structure. But what is happening is what the debate is all about, it takes place in between, in all the multiple sort of intermediary bodies and institution and groups and occasion and meetings that exist between this micro and macro, which were very difficult to observe with the traditional conventional method of social sciences, which are becoming open to observation. Thank you to Richard Lometis. Thank you.