 Okay, I'm going to introduce our keynote speaker Richard Rogers From he's from the USA. I didn't know that he's from Lawrence, Massachusetts And he told me he's living in Europe for the last 20 years So and Richard Rogers just a short presentation. He's new media professor at the University of Amsterdam He's director of the gothcom.org foundation. It's a foundation that have received a lot of research grants from The Dutch government sort of foundation open society Institute for foundation open society foundations Etc. At MacArthur Foundation Gates Foundation, etc. He's visiting professor in science studies at the University of Vienna His annual fellow at the annual school for communication at the University of Pennsylvania one of the most prestigious Universities in the USA in media and communication He's visiting a scholar in comparative media studies in Massachusetts Institute of Technology the MIT We have a couple of colleagues also one of them is working there They go and back To the MIT comparative media studies and he's founder of the digital method initiative His academic work focuses on web epistemology an area of study where the main claim is that the web is a knowledge culture disting from other media His author of different books his author of information politics on the web from the MIT press From 2004 and 5 it was selected in 2005 the best information science book of the year by the American Society for information science and technology he has a Have them here looking for the autograph Digital method at one classic. This is from 2013 okay from the MIT press also a great book it has been awarded by the International Communication Association ICA ECA in 2014 and the last book this one from sage Doing digital method 2018 and he's working a new book Called critical analytics for social media So maybe later we can talk about the next book. Thanks for accepting the invitation He has a very very complicated agenda and We invited him to open the semana Finally, he said no Friday. Okay. We close with the keynote He wanted to come on Saturday and Sunday. We say no no we are close here Sorry for the windy day. Usually Barcelona has a beautiful weather, but sometimes we have also windy weather We're in the Mediterranean. Well, thanks for accepting invitation. I'm going to sit in the first rose and he's going to talk about He's going to present his research and for about one hour and then he's going to we are going to sit here So we have the conversation and with your questions also. Okay, so thanks again Okay, thanks very much. Thanks for the kind invitation. Thanks a lot for coming What I'm going to do today Is three things the first is I'm going to talk about digital methods Which is the which is the title of the of those two books? I'm going to talk about it historically So what are digital methods in relation to? Internet studies research more generally or the study of the internet or the study of the web actually more specifically Then secondly, I'm going to talk about digital methods Epistemologically, I'm going to situate digital methods within the digital humanities as well as the digital social sciences so to speak and and and try to Make some distinctions between those three areas and then thirdly, I'm going to talk about digital methods practically I'm going to go through the study of digital methods for social media research largely the mainstream social media platforms, so I'll talk about Using digital methods to study Twitter Facebook Instagram YouTube and I'm also going to mention at the end the deep vernacular web Including 4chan and Reddit as well as telegram, which we've recently developed a tool to study as well So that's the the agenda So this is the this is how I'd like to situate digital methods historically So generally speaking The the study of the internet started with the study of something that was referred to as cyberspace now Nowadays the word cyberspace at least in the English language discourse only appears very in very specific sub discourses cybersecurity and info war but previously cyberspace was thought to be of this sort of distinctive realm apart and and we Projected a lot of ideas about the of ourselves and the future on to this Onto this realm called cyberspace. We thought of it as a new potential Neopolaristic space. We thought of it as a place for new kinds of identity play We thought of as a kind of space to reinvent reinvent our bodies So it was very much an imaginary and those who studied cyberspace at the time studied it as a technological imaginary I think right around the the end of the 1990s early 2000s the social scientists arrive and in particular the ethnographers and I'm referring to not only Christine Heinz work on visual Virtual ethnography, but also Miller and Slater and what they did was they grounded cyberspace. So they they actually no longer thought of it as so much of an imaginary, but rather they went offline in Order to study the online. So they went to cyber cafes. They used over-the-shoulder techniques They interviewed they surveyed and they came up with a lot of interesting notions So they grounded they grounded cyberspace by for example coming up with a notion of the digital divide now some time around 2007 and this is where I would situate digital methods some time around 2007 2008 we had a Quite a what what is now retrospectively referred to as the computational turn and this turn It sort of inverted the whole idea of what of what cyberspace or what the online world was about So previously it was thought as a realm apart or something quite separate The people would study online culture, but now beginning around 2007 people went online to study culture Or to study the societal so no longer was it a space apart or no longer did have a particular Asterix to it and I called this in a publication in a book Booklit that I published in 2009 the end of the virtual and the end of this virtual was a sort of declaration of the end of this Sort of separate space of the special separate space And I think that this also now in the most current era Which people refer to as the post-digital I think it's something quite similar So the post-digital has a couple of meanings One is that we no longer need to use the word digital as some sort of special adjective So digital methods. No, it's methods So this is one particular idea that comes along with the with the post-digital the other however is Something that is quite recent after the Cambridge Analytica scandal We have what is now being referred to increasingly as platform lockdown And so we need in order to study the societal We can't only study it via the digital because the data is now being cut off from us certainly on Facebook Which cut off its pages API on the 4th of September of this year certainly on Instagram which cut off its API in June of 2016 so now we're scraping and we're doing what is called Post API research if you want to talk about that I'll talk about more about that during the workshop later this afternoon So this is the periodization. So what I'm going to focus on largely is this third period. I'm not this is also Trans-historical I'm not saying that one period has definitively ended and another one began. No, they're all layered and stacked But I want to concentrate mostly on this this third one and I want to start with an example of the sort of the term the web the Web data turn So what you see before you is a map of the US With some states colored purple and others a different color These are searches for recipes in a website called all recipes calm The day before the American feast, which is Thanksgiving and if there's an over It's of disproportionate high number of queries for a particular recipe in a particular region It becomes darker. So sweet potato pie. Therefore is being searched for in far greater quantities in the south macaroni and cheese similarly Sweet potato as a different geographical distribution corn casserole. That's the sort of the corn belt so to speak Green beans in the west Turkey brine to the north yams to the west so what you see here is a kind of geography of taste The geography of taste that is mapped via web data And I would challenge you to think of ways to do this without web data And this is where the other notion that I have Tried to coin comes into play and this is the idea of online groundedness So oftentimes people will make findings with web data And then they think well now we need to really we need to go offline and and then we know it's true well With online groundedness the question is well, when is it appropriate under which conditions? Can you ground your findings in the online? So for example, we could use Instagram pictures that people take of their food To ground these sorts of findings for example This is so whereas these are trends. This is then specificity So this is these are this is a graphic that was published in the New York Times. I think in 2015 or 2014 these are the most specific recipes that people queried per state and then you see things that you've never heard of before like frog eye salad and Funeral potatoes So I mentioned the web as data turn and and I think that You could say that it goes back to a couple of very very well-known pieces One was by Duncan Watts called the 21st century science published in nature in 2007 another one Which is I think even more famous probably even far more cited is David Lazare and company's piece On the computational turn on computational social science, which was published in the journal science in 2009 now both of these basically harked to a Coming period where you would use web data to figure out what's happening in in society and culture more generally And so here the emphasis on is on traces So the idea that we leave traces behind footprints in the snow Which then scientists can use in order to find out tendencies trends indicators, etc Now it's become that said that idea of social physics as Sandy Pentland's call it that idea of Use of people leaving behind traces has been challenged recently not that recently It's been challenged probably from the beginning Because not only do people leave traces in social media, but social media is prompting you to do certain things So so there are more than mere traces online, but nevertheless That was the that was the idea with the web as data turn now This is a major sort of turning point So you've heard I guess of Google Google flu trends Google flu trends was the flagship big data project That Google org so the the the sort of non-governmental side of Google Actually implemented and with Google flu trends you could predict the incidence of flu and geo located Faster and just as accurately supposedly than the traditional techniques However in 2012 2013 something happened Google flu trends was suddenly Overestimating the incidence of flu by two times In in the US. So why was that happening? Well What was happening was something that is a kind of warning to everyone who tries to use web data As societal indicators and so when it's flu season, and I guess it's flu season now You might feel Symptoms you might be coughing and sneezing and then therefore you type in to Google flu flu related symptoms etc, right or You see that it's flu season because you're watching TV and it says it's flu season and then you type into Google flu So the question is when you're doing these searches Does it reflect something that's happening in the wild or does it reflect something that's happening in media and What's the difference and how can you just disentangle the two of them? So this was the this was the larger the larger question in in what is big data actually measuring So what I would like to do now is Talk a little bit more epistemologically rather than historically about Digital methods as a kind of concept and I'm gonna I'm gonna talk Epistemologically in this sense here I want to make the distinction that this is not a kind of religious distinction or it's not a fundamentalist distinction But it's more of a provocative one. I'd like to make the distinction between the the digitized and the natively digital And this is something that does not refer to generations. This is not demographic So that this does not refer to the notion of the the digital native at all Rather it refers to the idea that there are data that are quote-unquote born in the medium and data that have Migrated to the medium so things that have been scanned or things that have been digitized So there are the there's that data that is that is natively digital and that there is that data which is digitized I also want to make the additional distinction on a methodological side There's there are methods that have migrated to the medium survey and There are methods that have been in some sense written for the medium These are called these I dub natively digital methods And if you think along these lines if you buy into this distinction, that's the first step If you think along these lines then you can begin to situate various approaches in the digital humanities and the digital social sciences That's what I'm going to do I'm going to talk a little bit about just a couple what I feel are emblematic approaches in the digital humanities I want to talk about cultural analytics and cultureomics and in the digital social sciences I want to talk about web of metrics and alt metrics and then I'll then contrast those approaches With what I've called digital methods just to nuance epistemologically where that notion is coming from So I want to start with cultural analytics cultural analytics is quite well known. I think At least in the digital humanities it outputs things like this. So this is an image wall And it is an image wall that Organizes the images according to either formal properties like saturation hue brightness etc or chronologically and what Manovich argues when When compiling this is that what what we have here and by the way, these are the Front page covers of time magazine the sort of tone setting American magazine, but but it's but he's done these for artists like Rothko van Gogh and also Video games comics, etc. So this is a general technique and and you'll notice what I mentioned these that all of those things are digitized materials So what he argues is that you what you create in that space or what you're analyzing is a style space and With a style space you can you can see gradual changes in style at particular points in time or Different styles having been clustered because of their formal properties. Now Manovich is has a sort of An art history background and so when you're grouping these things they're grouped by formal properties, right? So these are these are formalistic groupings materialistic formalistic groupings Now what's interesting I find about Manovich is he's a big data proponent and he makes a very sort of What I find to be quite nuanced arguments about why big data is interesting arguments that are quite different from the social physics Definition so what he argues is that big data allows you to no longer? periodize So you no longer have to make these like like I just did previously. You don't no longer have to make these periodizations rather you can you can you can see continuous change and You no longer have to categorize you don't have to make categories because you can see continuous you can you can Make continuous descriptions So this is what he argues as the interesting part of Big data just to give you another sense of cultural analytics. This is slightly different. This is from his selfie cities project so he took He queried in the Instagram API at the time he created hashtag selfie And then with the geo coordinates of five cities Rio, Moscow Tokyo Los Angeles and one other and then he analyzed the formal properties of the selfies and you can see what the properties are There and ultimately what he was doing was Doing a sort of city mood or city sentiment analysis So what he find was that Rio was quite jolly and Moscow was quite grim Okay, so that was the first one the second one that I want to mention is is cultureomics cultureomics makes use of the Google n gram viewer, so Like this so these are this is the Google scanned book or Google books So this is Google scanned books project and there's a there's a piece of software called the n gram viewer Where you can type in keywords and query them and then you get their incidents over the last sort of two centuries or more in Something like at least in the English language what they said was it's sort of five percent of all books ever printed It's not only in English. It's an also in a number of other languages as well. And and here you study a Trends Different kinds of trends at least as you can see them through the printed word So you can see here the ups and downs of the interest in different kinds of math And here the ups and downs in the interest in different kinds of characters Whether literary or scientific one of the things that I found interesting about this work And this is on this slide here Was so if you look this up, there are all these in the in the published studies There are all these little miniature projects that are that are Described this is also in a way of the same way that digital methods works with with small self-contained Projects with with with a story to them So one of these that I really liked was that they found that a celebrity is is well known Historically for shorter and shorter periods of time over time So that the idea of a celebrity and what a celebrity is and how long a celebrity is a celebrity has changed For example, this is one of the projects Okay, so in those cases what we were looking at were digital humanities quite flagship digital humanities approaches and Using largely digitized materials so the scanned things so scanned books on the one hand with the n-gram viewer and culturalomics and scanned Artworks or magazine covers with with cultural analytics So they also used what you could call digitized methods So these were these are a set of methods that were drawn from art art history or or or other digitized traditions now. I want to move now to the Digital social sciences and talk only about two approaches that there are many many more But I think for me at least these are quite emblematic So what these approaches do is they take natively digital objects like hyperlinks or likes And then they apply or digitized methods Scientimetrics bibliometrics quite standardized methods to them. So they take They take traditional that they take the social scientific Instrumentarium and apply it to the new objects of study. This is quite a typical thing to do And there's nothing wrong with it So this is the two things that I want to talk about our web of metrics And altmetrics so web of metrics and both of these are ways of measuring societal or in fact scientific reputation So these are ways of deriving reputation ultimately, but initially there are ways of deriving impact. So for example Here, this is a kind of web of metric approach. Now What you see here, this is an output of a piece of software that I made. It's the issue crawler So it's not it's not from this. It's not from fell wall in the specific web web of metric community But nevertheless, this is one example And this is a kind of typical graph People that nowadays are you know beginning to critique the hegemony of the graph Especially in this kind of this kind of work big data work or even medium-sized or smaller data work You know, why do we always need to see the graph? Okay however, here's a graph and So this is this is this is a hyperlink network So there are the the nodes or websites and the edges or the connections or links and it's a directed Network map. So there are arrows and I just want to mention what this is so there are basically two clusters one is of Basically the the blue dots and these are Armenian NGOs Armenian NGOs NGOs from Armenia, right and they're linking to one another quite massively then the other the yellow ones those are UN agencies largely intergovernmental and they're high they're they're interlinking Okay, so here's the story the Armenian NGOs link to the UN and the UN does not link back So this is very very typical politics of association where you can see reputations Or reputation making or the represent or the reputational in action through hyperlink analysis So you you see here how that is is measured or an example of it Okay Olds metrics now all I don't know if you've you probably have seen it. Have you seen this doughnut? This colorful doughnut published next to an article a scientific article when you've looked up something Well, this this doughnut this is from an I mean there's a variety of ways you can get to it You can actually also install a kind of bookmarklet or sort of extension So you can you can generate doughnuts When you are looking at a scientific article and so what these are these are altmetrics and altmetrics As a form of scientometrics, but instead of studying in the webometric sense links hyperlinks or in the classic Scientimetrical bibliometric sense Citations they're studying Likes basically, but it's not just likes it's it's retweets So it's mentions in its references Or mentions in social media and the social media here is also academic social media. So it's kind of interesting So it's so it's also Zotero Mendeley so the so these the kind of academic social bookmarking software you can also call them Biblio online bibliographies. So it's measuring mentions of Articles in social media. These are these are altmetrics So this is this is so in this case as in the case of webometrics, we have the study of natively digital objects Then using digit using digitized methods so digitized Scientimetrics so to speak. Okay, so some some of my colleagues say well Richard, what about that blank box over there? What's in there? So I'm sure that there are that I mean this one colleague of mine has actually come up with an example of Of an approach over there. I I'm not quite sure yet. But anyway, I'm just gonna move to the bottom right and this is the digital methods So I want to start by Mentioning a little bit in more depth what I mean by the notion of the natively digital So some people might think oh, that's kind of anthropological idea. It's not It's actually comes it's it's comes from computational culture and it's the idea of the native So you can run native Facebook ads. I have a native Apple adapter here So a native native in a computing sense is that which is written for the processor. It's not emulated Right, so it's native in that sense And so when I talk about the natively digital and I don't know how elegant that notion is in any language English or other languages When I talk about the natively digital, I mean that which has been written for the medium And to work and to survive and to be adapted so it continues to work and survive in the medium And so this is then things like things anything from the crowd sourcing and Folks on a me's and these sorts of things to to sort of page ranks and all the kind of analytics Etc. Etc. So so that then that then that then end up in feeds or filters or recommendations So these are the sorts of things these are the these are the natively digital Methods and I contrast them with the digitized or digitized methods The idea that there are Online surveys that there are all sorts of methods that have migrated and that have adapted To the medium, but they're not written specifically for the medium or you could say that there's a gray area there That that you could also argue more radically that all Methods come from elsewhere. There is no such thing as the natively digital. Everything is at adaptation. Well, okay I would agree with you. However, some are adapted more radically or optimized for the medium and others are more clunky So what are digital methods then? To me digital methods are kind of like a software project it's a very different it's a very different research philosophy and It's not for everybody, but it's it's interesting as an approach so the the gambit is is that you look at a platform or a medium or You know Wikipedia for example, and then you say well, what kind of digital objects are available? Okay, well, you have different language versions of the same Wikipedia article. You have edit histories you have you have Time stamps you have all sorts of different things Digital objects and then you say well, how do the Dominant sort of engines or devices or platforms of the medium oftentimes handle those objects What does Google do with hyperlinks? What does Facebook do with likes and then you say well? How can we repurpose those methods of the medium or how can we repurpose those feeds or those filters? in order to do social and cultural research, so Digital methods are a kind of a webby approach to doing research in the sense that they remix or repurpose The data and the methods of the medium for social and cultural research And so what I'm going to do now. Oh, yeah, and then the last part as I mentioned before there's also a Harder epistemological problem and that is when you begin to make findings in the online Can they be grounded there or do you have to go offline? Do you always have to do quote unquote? Mixed methods in a sort of online offline way. There are a number of mixed methods types There's quality quantity and but this is online offline mixed methods. Does that have or can you ground your findings? in the in the online Okay, so what I'm going to do with the remaining time Is I'm going to talk about Digital methods practically now there have been a lot of different things developed over the past ten years So these are some of them This this is all so all of these particular methods are written up in the new book They're doing digital methods book. So how do you study hyperlinks? I gave you the one of the one of the pictures another one is the study of internet censorship in depth study of internet censorship and Another one is archived websites making use of the internet archive and the wayback machine in order to make screencast Documentaries like time-lapse Photography of the evolution of a website over time and then narrate the changes thereof like for example how White house gov the sort of tone setting American presidential website changes quite radically when a new president transitions in There's a discussion of how to repurpose search engines And how to how to study Wikipedia? So I'm not going to talk about those that would take far too much time But what I'm going to do is I'm going to I'm talk mainly about social media Today, I think a lot of people are really interested in researching social media. I think it's it's Also quite poignant these days so I'm going to spend some time So this is this is the way the books laid out. I have laid it out slightly differently One of the things that I want to mention I'm not going to go into this in depth today But what I want to mention this to you we can also talk about in the workshop is that a lot of this work Suffers from what I call single platform studies So oh another Twitter study. Oh another Facebook study. Oh another Instagram study Whereas when you're doing societal and cultural research, you probably want to do cross-platform analysis and that's not that straightforward because well, what what what is it that you compare across platforms if you want to compare a hashtag for example, you'll Immediately realize well, actually there are really no hashtags on Facebook And they're used very very differently on Instagram than they are on Twitter on Twitter It's considered poor practice if you use more than three hashtags on Instagram is considered poor practice if you use less than three so so They're very very different so the cross in cross-platform analysis requires an understanding of web vernaculars I'm not going to get our platform vernaculars or the culture of use of those platforms So I'm not going to Go into that any further, but I want to mention that now so I'll talk about Twitter Facebook Instagram and YouTube I'm not going to talk about trackers Cookies and third-party elements, but we can talk about that in the workshop if you want to okay Twitter study so so I'll just do four platforms or I'll I might I might mention telegram as well Okay, so Twitter so we've developed a number of approaches to study Twitter I want to get into them, but first I want to base. I want to give you a kind of little Sort of small periodization of Twitter. I think Twitter's has changed quite dramatically over the years So Twitter started off as something quite banal. It was oftentimes referred to as the what I had for lunch medium And people were using it to sort of tweet their favorite flavor of burrito as was critiqued And that coincided with Jack Dorsey's original Twitter, so that's the what are you doing Twitter? That was the motto and that that kind of changed But before we get to the change, I just want to show you the original Twitter sketch This is kind of a fascinating document in and of itself So this is what Dorsey sketched out is the original Twitter. You can see that he used a domain hack So stat dot us so status was going to be the the URL and you can see who it was directed to it was directed to sort of young San Francisco urbanites who as default settings were either in bed or going to the park nice life So this this this changed quite dramatically somewhere around 2009 I mean not only did it coincide with and this visit is a longer sub story to this but it coincided with the change of the motto to what's happening But it also was interesting how Twitter almost overnight went from the what I had it for lunch medium to a revolutionary medium Where you were following events as they unfolded remotely? So this is this is a particular approach that I want to talk a little bit about And that is remote event analysis. So this is the Je suis charlie of course with the Charlie Hebdo attack in Paris The and and those tweeting about it. So it's not it's not localized It's a it's a it's also when you're studying Events remotely you can also study their reception globally so Generally speaking we set up a very very simple technique to try to capture an event In on Twitter On Twitter and on the ground at the same time to create a tweet collection That enabled you to turn Twitter into a sort of event of sort of storytelling machine So this is more of an artwork. There's a media artwork. This was in fact displayed here I think in Barcelona 2012 I think it was Nevertheless what we did here is we took the top three retweets per day for for the hashtag Iran elections and then took the top retweets per day and put them in chronological order versus the as opposed to the reverse chronological order and This what we were able to then capture in essence what happened in those 20 days during the Iran election crisis of 2009 which we ultimately boiled down to the size of a single tweet So this particular technique is a technique for quote-unquote remote event analysis However, there are a number of other techniques that I want to briefly talk about one is to treat issue as a treat Twitter as an issue space Twitter is used professionally of course by issue professionals This was a piece of work that we did for the Gates Foundation There when you look at Twitter analysis software and we made a piece. It's called t-cat or DMI T-cat which you can download and install on a server and use to Capture tweets using the streaming API When you see this kind of software you normally see a lot of modules of all sorts of things you can do So what I've done is created a number of sort of simple recipes in order to study Spaces so for example, this is a this is a hashtag analysis. This is just a frequency analysis You can also do co hashtags, but if but hashtags are oftentimes embedded social issues And so when you do a frequency analysis You can see the sort of issue agenda of a particular issue space. So this is global health and development The other one is is to the study of dominant voice So if you look at an issue space so you can you can ask well Who's tweeting the most but you can also ask who's being mentioned the most and so this is again This is the the global health space and you see the Gates Foundation Which is the major donor in that space and and bill himself being the ones who are Engaged with are mentioned the most and then the third one is a URL analysis So this this is a kind of content analysis or it extracts the URLs from so what you see here are Hillary Hillary supporters this is the 2016 us elections and the and the media sources that they that they tweet and Then Trump supporters and the media sources that they tweet and you can see that there are only this is a classic polarization Graph, but you can see there are only a couple of sources in the middle that they both mentioned What's also interesting about this is if you look at the at the Hillary media mentions You see that they're quite mainstream whereas the Trump ones you see are quite extremist So there was already a kind of Indication of this sort of turn to the right of the web This is Segmented audience so this is Twitter does not want you to do this It's kind of against their terms of service, but I think it's a it's a It's something that is a legitimate thing to do for research Twitter doesn't want you to Segment an audience or segment a social group or social movement and then spy on them basically That particular this in the developer terms of service that particular prescript is largely for governments It I don't think it applies so much to academic researchers There are reasons that one would want to do this. This is before that they were deplatformed or most of them by Twitter These were the alt-right core Twitter accounts and what we did in order to sort of Segment the alt-right and its audience was to take those Users that were mentioned by the core and these are these are by all of the core or seven members of the core Mentioned by six five four So basically you get this sort of this kind of alt-right supporter network so to speak The last one I want to show you is Public figure now a public figure analysis. I mean that I don't know if you've seen this before this is one of the funniest The five hundred six seven people places and things Donald Trump has insulted on Twitter Okay, so this is this is this is more data journalism This isn't so academic, but it's basically making you take it. You make a tweet collection of a particular politician I Know that our Italian friends are doing this for like Salvini and and others so it's so So this is an example is a little bit more academic So this is the populist politician here at builders that in the Netherlands And the question here to what extent should he be considered to be a part of the new right? Where the new right is defined according to the London based think tank demos as having particular characteristics as being anti established management anti-globalization anti-immigrant etc. And then you characterize the treats qualitatively You can also use some some quantitative techniques some natural language processing You can also try to do some machine learning here if you want to but nevertheless, this is an example of public figure analysis So that's that's Twitter I want to talk about Facebook Facebook is something that also has Evolved over the years quite quite radically. I would say at least it's study So it started as something that people would basically study Profiles and friends and there was always this quotation marks friends, right? And then we thought oh, that's interesting So and this was oftentimes a social network analysis of tastes and ties Now that became and there were some some ethical issues there and that that became sort of I mean I just want to show you this. I don't know if you've seen the movie the social network Do you recognize this guy? Yeah, maybe So you see this is this is so he's the guy in the movie that told that tells Mark Zuckerberg So so it says it's like the Facebook. He said no no. He said drop the the it's really lame It should just be Facebook. That's him That's Sean Parker. He was also the developer of Napster the Great Disruptor you can see in the early Facebook The profile was sort of the most important, right? It was like what are your tastes? What are your interests and this one is also kind of quasi dating idea So that that was that's what often people studied and when we when we studied profiles And and interests in particular we came up with this term as an approach. This is of an approach to Facebook First-era studies we call it post-demographics and I just want to show you one of the outcomes This was again more of an art piece. Although we did publish academically on this as well so this was a look at the Top thousand friends of Obama and the top thousand friends of McCain This was before the twenty two thousand eight elections I think and then their interests and then we're studying the extent to which they were compatible So we're studying here culture wars. We're studying here the politics of media So I'll just give you an example. So for example the Obama friends their favorite TV shows are the Daily Show lost and The Republican McCain the favorite TV shows are Family Guy project runway America's next top model CSI desperate housewives So you see something quite different. You see a very very different Politics in in in media You can also do this for an interest So what so so like if you have an interest chess, so what are the most related interests? So in this case smallville and Batman I Mean this wasn't the turning point. It may have been the turning point in academia This I mean, I think we're now really experiencing a kind of ethics turn You could say that it started here. This is 2010 This is this is the the taste and ties research that was done by some researchers at Harvard and Michael Zimmer of the University of Milwaukee at West the University of Wisconsin at Milwaukee was able to Deanonymize the the the data that that he used He was able to he looked at some of the some of the statements that were made of the posts searched Facebook for texts in the post and then found the individuals and so this sort of de-anonymization prompted a lot of kind of ethical Questions in doing social media research It also prompted it also coincided with the changes in Facebook's API from 1.0 to 2.0 and the shift from studying sort of profiles and interests to studying pages and This is probably the most famous Facebook page I think or at least it is to me. This is we are all colored Said the Egyptian Revolution of 2011 so this sort of kind of summarized in some sense It's study summarized the change in what people were were interested in so we developed a couple of techniques One was Interliked page analysis the other one networked content analysis. So here you see pages can like other pages And what you see here is the sort of the kind of extremist right wing in Europe. This is in 2014 I think this was and and the pages that are liking other pages and you and you get a network there at the time So this is I mean when the when there was the Cambridge Analytica scandal It was found out or it was in the public imagination that what you could used to get the names of the administrators of yes Well, this is and I put this up here Not for you to tweet it Because normally I wouldn't publish this but I want to show it to you because this is this is These are memberships of groups by extremists right wing extremists in Europe and the one in the middle is the Administrators and the one in the middle is the one that is the member of the most extremist groups on Facebook at least in 2014 So you could get these kinds of pictures The other this is networked content analysis. This is I should warn you This is not this is not that nice. That's very offensive. So this is the Islamophobic stop Islamization movement and what you do here is engagement analysis So which posts of a collection of pages have the most like shares and Comments this is sort of standard engagement or Facebook calls it interactions So you can do engagement analysis and when you do that what you find is that the most engaged with content is oftentimes Oftentimes or universally in fact memes So when you study engagement on Facebook you end up studying memes And What's interesting about memes is that they're different from viral so a viral is a single piece of content and a meme is a collection of Content and what you do when you mean is that you contribute to a collection of content So memes are additive and they're also said to their their additive in two senses both in a Contribution sense, but also in a cognitive sense. You added added an additive cognition Okay, that's Facebook. I just have this is fake news. I'm not going to go into that Okay, I want to talk just about Instagram and and then YouTube and then we're done Instagram I Think well the two the two techniques that we've developed is to study antagonistic hashtag Publix As well as artificial amplification and I'll mention those but first I want to just do what I did with the other ones He are also for Instagram. I think Instagram of course is very well known As being a site for the study of selfies and selfie culture. I think that shifted a little bit I mean it's still going on, but I think that it shifted a little bit with the rise of sort of antagonistic hashtag So first I just want to show you that this is the first Instagram photo ever This is Kevin Systrom the the founder of Instagram. That's his girlfriend's foot And that's his their dog. So this is a kind of family selfie. So this is the sort of opening of Instagram, right? So it's it's quite selfie-ish As I as I showed you before this is the the academic study there of right so there's studying a selfie city studying Selfies around the world for their formal properties. I think that that shifted a little bit with the rise of using Instagram also for Social causes and issue work. So this is Justin Bieber The celebrity the influencer tweeting about Black Lives Matter, but it is a Or tweeting posting and it is a antagonistic space because it's also a space where you have Things like all lives matter as a kind of counterpoint to Black Lives Matter So here to study antagonistic hashtag publics we adapted the model from Bruno tour Studying programs and anti-programs and this is from the 2015 US Supreme Court's same-sex marriage decision Where you had the proliferation of the hashtag love wins followed by the proliferation of the hashtag love loses and Jesus wins So when studying antagonistic hashtag publics, you can study filter activism the extent of it Of course Instagram is well known for its filters But you can also study the location. So this is like geolocating hate so to speak or the geography of hate And then you can see the program is being far more widespread And the anti-program is being quite specific and and geolocatable Yeah, this is follow follower ecologies. I'm not gonna go into that. Okay, and then the second one I want to talk about is Artificial amplification so Instagram as you know has the biggest market for fake stuff Fake followers in this particular case followed. I think in second by YouTube and then third place by Twitter. The markets are international You can buy fake followers in Germany From Germany companies and their quality fake followers and you can buy them in Indonesia where they're cheaper And you can buy them in Brazil where they're the cheapest and and there's a huge market for them And it's very interesting to study this market. It's also very interesting to study what how people Define fake followers. This is one technique. This is using the tool hype monitor Which recently went offline. I don't know whether they're gonna relaunch it But there are a number of tools and it's interesting to look at all the different techniques for deriving fakeness It's the same as looking at all the different techniques for deriving botanists on Twitter, for example It's a very interesting area of study. And so here this is this was applied to the US to the Netherlands elections recently in 2019 so we did the so our group did the fake news study For the Dutch Ministry of Internal Affairs. It's gonna come out as a book Quite soon. It's called the politics of social media manipulation. And this is one of the products thereof and then you can see So what's also interesting? You can see who has the most fake followers like that's Herd Wilder is the populist politician by far, but it's also interesting to note that That there's a normal percentage of fake followers something like 18% is normally fake and You can talk about that. Okay, the last one YouTube These are four different techniques that we've developed to study YouTube. So all of these techniques They're the platform centric, huh? They follow the platform and the API and they reap they make use of what's available and they repurpose it for social media for Societal research or cultural research. Yeah, so there's a platform centric techniques There are many many other techniques that one could use in order to study these platforms So I'll give you a YouTube is the final example So YouTube is oftentimes been described as an excitable algorithm So you can study its excitability for example by I'll go to those by looking at the the carousel. So what's up next and and and Looking also the second one is the extent to which it's a rabbit hole. So whether or not what's up next Pushes you to more and more extremist videos. So these are that these are a couple of claims that are often made You can also map channel networks So channels can subscribe to other channels So you can also do a network analysis on on YouTube and finally YouTube has been deleting a lot of stuff lately a lot and It is it is difficult to actually determine what has been deleted and why or actually how And there's the question of the extent to which YouTube is deleting things automatically without humans in the loop And then and the politics thereof and and how YouTube's politics are also being challenged especially by the right And especially by those who have been deep platformed or removed from from social media. So deep platforming is also an up-and-coming topic I Think YouTube can be easily periodized I've also done this in the doing digital methods book I think YouTube started as an amateur production space I think that is in evidence if you look at the top videos by view count and you do time slices So if you do top videos by view count 2006 2010 2010 to 2014 what you'll notice in the first period is that they're amateur videos Charlie bit my finger you know this one or The evolution of dance is this one guy doing 20 dances. It's really right. So those are the top videos Around not around 2011 was the first year. I think it was 2011 where all the top videos the top 50 were Commercial productions, you know, so music videos. So it changed quite dramatically Also, what you saw in the second period was the rise of the monetization of YouTube So I'll just wait. I'll just let me show you first the three So this is the this is from the internet archive from the wayback machine. This is the very first screenshot of YouTube that's still available at least publicly. I mean It was originally a kind of dating site. I mean no one seems to know this So you would broadcast yourself with short clips of yourself. That was the idea broadcast, you know, we we later considered broadcasting yourself to mean You know, it's sort of amateur amateur productions Yeah, the rise of the youtuber I think is is fascinating And the monetization and then and then also the the contraversiality around that nowadays So, you know, you need to have a hundred thousand Subscribers in order to use the content matching tools to see whether someone else is using your content and earning money off of it Why a hundred thousand followers, you know, our subscribers why not why not a lower figure? It's not very sort of democratizing So so and then YouTube really heavily leaning on it's it's it's it's it's youtubers Also for its own up all on profit making So that's that and then we get to the current period where there's a lot of stuff that is That is being deleted. So I want to Just show you the techniques in brief and then wrap up so This is so when you use YouTube API, which is very generous by the way One of the most generous social media platforms and it works, you know, they're not endlessly changing things I mean they did change something recently, but okay So this is search and the query here is Syrian war And then you see this is a rank flow diagram And so it shows sort of the Where in a result count? Where we're in a rank a video is over time and so videos can go up and down and ranking over time And so it with the Syrian war where we saw here was was when there was a gas attack or something like this YouTube became the algorithm became excitable and more extremist videos would be towards the top Right around those periods. So this is this is one way to study So this is a what I call a source distance approach. So how far from the top are particular sources a second one, this is Another network, but this is this is a channel. This is who who subscribes to whom and who features whom This is the alt right again And what we noticed is that you could you could distill business Relationships between alt right extremist internet celebrities by seeing who featured whom You can make larger this is actually related channel network, but you can make also Subscription network so which channels subscribe to which channels that look like this So these are these are basically the four There's also this one which is not a technique that you can use with the API You would have to do this a bit differently So what we did here and this allows me to segue into the the deep vernacular web So we took the most popular popular is a weird term to use for this 4chan board Which politically incorrect so Paul and we looked at the YouTube videos We took the we just basically took out all the YouTube videos that were referenced in this particular 4chan board, which is quite extremist and then Six months later looked up to see where whether the videos were still available on YouTube The half of them are gone And we do have traces of what the videos were and a lot of the researchers that this was I didn't work on this project It was my colleagues They said well, you know I mean maybe they've become a bit immune to to extremist stuff studying extremist stuff But they were like this a lot of this stuff wasn't that bad the other thing that was interesting was that if they looked at the traces of When they were deleted and they were all deleted basically at around the same time So it was as if they were deleted automatically And which raises questions about the extent to which humans are in the lobe who's looking at this stuff etc so I want to just Conclude by reiterating that The methods that I presented to you are quite specific. There's a specific History to them. There's a specific epistemology to them and also a specific platform Centricity to them or medium specificity to them. So they're not for everyone, but they may be for some of you Thanks so much. Thank you, Richard. You're coming here. So we can talk a little bit Lorena you have the other Okay, okay. Well, thanks for the presentation. We have a lot of food for thought Any questions wants to start it's our official Brazilian question maker Next year we are going to have a contest Was more question we receive a discount on Okay, I know would be great On the ethics like how Are you imagining like as you said like a Lot of people are concerned about the privacy and And the that is starting to be one of the main questions you At least the project that is show it work it as collecting data without questioning for the users just as as I can see at least and It Gives the impression that you didn't ask for everybody that that you are showing that you're using the data a Consent so how are dealing that that would be a first question and Second question would be a How you deal because a lot of A lot of data analysis that you're showing a Can shows a contradictory Effect what I'm meaning by that for example you show it that at least 20% of Instagrams Followers are fake, but at the same times a you Do you have a tool for when you are trying to prove a point? About how much interactions Which ones are fake or which one aren't fake? because if not the the fake ones are gonna Blue your data in some ways that would be the the two questions Thanks so much So I think I think that the just to take the first one first so The the the question of of ethics and the and the ethical turn is a as a much longer subject Huh, so it's it's something that is hard to just sort of give a few lines of but Generally speaking I think that that there I just want to make two three points. So the first point is the question of Sort of personal privacy versus public interest, right? So when you're when you're studying extremists so, you know, you have to be careful because one person's extremist is another person's opponent, right? and on the other hand you could argue and also the other thing is is you also might not want to give Extremists much publicity or much oxygen as as Whitney Phillips calls it But then on the other hand the public interest of course journalists, you know, I mean academics Yeah, it's interesting like the extent to which academics and journalists sort of share the same kind of ethos Sometimes but not all the time but from a journalistic standpoint, of course or from data journalism I mean some of these techniques are for data journalism Be in the public interest for these things to be be known. So this is the first point the the second one Has to do with what constitutes a public figure And it's interesting question because you think you that you think it's kind of straightforward more or less However, what is it for Twitter? What is it for Weibo? So for Twitter? It's a public figure as someone that has 5,000 followers So so when Twitter releases datasets those that have under 5,000 followers, they're hashed, right? They're they're anonymized and those above they're they're open And Weibo it's 10,000 and but not they don't release public but 10,000 for different reasons For for for persecution. I don't want to smile But I mean if you have if you say something bad on Weibo and you don't have that many followers They don't care the state, but if you have a lot of followers they care But anyway, so so the quite the public figure question is something of interest and the third point of the final one is is I think it's important to follow The precept of sort of contextual privacy and that is the idea This is Helen Nissenbaum the idea that people in social media They don't expect that when they tweet or when they post that these tweets are post even if they Agreed to terms of service are going to be analyzed by academics etc. etc Now Twitter is is like it says it about five times in the first in the terms of service on the firsts of three Articles like your data will be you know, we will be analyzed by it is public It is open etc. etc Even though it says all those things you cannot use the term to service as an academic for cover and Say but the data are are public. No You have to do more than that So I think you have to think about the first two things the consent is another issue GDPR is not Is people are kind of researchers kind of afraid of GDPR GDPR is is really actually open It's really pro-research. So it says for example that if consent is is Improbable you can do other things right you can you can publish you need to publish your Research and give people the opportunity to opt out or give the people the opportunity to know about it So it even has that much lower threshold than consent So so anyway, I mean I didn't want to go on and on about that But those are at least four points and then the second one I find the study of fake followers to be interesting right I said that repeatedly and So the hype auditor technique uses as fake Also the signal of inactive. So if you haven't posted or you haven't tweeted for more than I don't know what it is 12 months, I think then you're considered to be fake. Now, I wouldn't agree with that But it is a it is an indication for some techniques other techniques There are different indications that there's no more normally like a list of about 12 signals And then people use or people like analysts use anywhere between You know for and all of these different signals and we could talk about these this is this is also the critical analytics project of mine to study Exactly the determination of fakeness, you know like and why why would that be fake? Etc. So it's it's it's it's something worthy of study and Yes, we do have techniques not tools but techniques to to for the determination I call these credibility metrics. So I have this whole Set of metrics that I use to to think about how people appear to be credit credible online But that's in for another Yeah, okay Thanks for your your conference. It was very nice. I wanted to know because I'm very curious about YouTube I consume a lot of YouTube content and I follow different youtubers and With the issue of How to say Banning hate speech etc. Etc. I've seen a lot of youtubers that complain and that have had cases of like let's say It's a humor channel and they make a joke or whatever like really not hate speech like something very Let's say not innocent but not willing of course to offend anyone Or let's say they play video games and they have to role play as if they were a dictator or whatever You know like on this humor frame and many of them they get like demonetized or striked or banned And I'm really curious to know if you know it whether YouTube algorithms are able to distinguish what it's between humor and hate of pitch yeah, hate speech and Whether do you think this this can change or if there's the technological Support to actually be able to distinguish these two Different Speeches let's say well I mean, I don't know if people in this room are working on that subject matter But but I've started a project on this so I would encourage you to work on this so so the project that I started is called deep platforming and The question the very simple question which is deceptively simple is the extent to which deep platforming works and then works for whom Right so Facebook and I've done one study and this will be published in the European Journal of Communication in Like a couple of months. It's called it's called deep platforming. So that's the key word And what we found in that study so we studied extremist internet celebrities largely in the US and the UK Who have been deep platform especially in the May 2019 purge So that we're talking about well, there's a bunch of them We're talking about Milo Alex Jones Laura Loom or these sorts of people And what we found is that? They migrated to telegram and to an alternative social media ecosystem and it's very interesting So I made a map of this should I'll show it in the workshop for those of you It's an alternative social media ecosystem with mines parlay bit shoots gab etc etc right What has happened is is and we also studied discursively what they say about the new platforms and the old platforms What's happened is that the extremists now really dislike Facebook and Instagram and They don't link to them and they don't refer to them But so so it so Facebook and Instagram are kind of benefiting in some ways from this deep platforming However, the extremists still think YouTube and Twitter are extremely relevant even though they were thrown off So they're like can I get on your YouTube show they like kind of like cross-hosting Can I you know if you if you if could you retweet? Now if you see this message, could you retweet it? Can you send it around Twitter? So so Twitter and YouTube have not benefited from this. They're still considered highly relevant That's just one point and I just find it interesting the second one is that So there's there's there's there's two ways that contents flagged right there's user flagging and there's automated flagging And and then user flagging has a human in the loop and then automated flagging. We don't know Right, and that's why I showed that last thing So we don't actually know so we need to we need to research this and in the order to research it You need to make collections of YouTube URLs at least but but also videos if you can And then check back, you know six months later, whatever This is how you need to do that kind of longitudinal research. Otherwise. We don't know what's been deleted. I Call you mine Marco Bustos. I don't know if you know me. It's a Twitter expert He will be publishing soon paper on on the brexit tweets from 2016 So he had a collection of I don't know what it is three million or 30 million. I don't know what the order of ten is but anyway What he found was when he went back Three years later. He found that something like 35 percent of the leave tweets and And the accounts are gone 35% what does that mean? Okay, it could mean two things And both of the things could lead to the same conclusion thing one is the users deleted them they left thing to Twitter deleted them Okay, so why would the users leave and why would Twitter delete them? Because there's some sort of influence campaign or some sort of right Possibly, I mean, this is not what is being concluded. This is what is being surmised So again, this is because yeah, and we have the same for for the 2016 US presidential elections We can now follow his method Rehydrate that so when you share a tweet collection you probably know this you have to send it to Twitter first And then it comes back down and so those tweets that have been deleted are gone It's be it's to follow the terms of service. You have to be a good partner So then you can see what's been removed. So you need to create collections In order to study this Thanks so much for your very interesting interesting Speech and so I'm I'm actually working on ETS by project and It was a bit Difficult for me because I I go to my paper Retracted twice. What project you working on the diaspora project. Yes, bro. Yeah simply because I have been working with With biography and discount stuff And so the question okay, if I've been answered something like okay this method we're not sure about If it really represents the people that are behind these swabs So that's a there's a question. So how did you deal with that and what would you answer? Disparations because for me, it's it's quite interesting and it's quite evident, but Potemite it isn't so that's a question. Okay, so I Think I think you should approach that critically. So I'll just tell you a brief story and then so you so I've published Two papers on on mapping the diaspora diasporas on social media So one in the Somali diaspora and one in the Rwandan and I'll just tell you the story of the want Rwandan diaspora because I think it's the most interesting the Rwandan diaspora on Facebook is quite large and I can Talk to you about the techniques of low of locating it. I mean, it's basically that you query Rwandan diaspora Not in Facebook graph search But in Google oddly enough like Facebook dot com like site colon Facebook dot com and then Rwandan diaspora Then you get this list of Facebook pages So then we did an interliked page analysis as well as and this was when the API was working and we did a Networked content analysis. So which posts were the most engaged with? what we found was that the Those people that were critical of the Rwandan regime and there's much to be critical of the Rwandan regime that that's the size of them on Facebook in the diaspora was tiny and the size of the pro president Kajami that What's the name? Kagami the pro kagami Presence in the Rwandan diaspora on Facebook was massive so it is if It gives you the picture that The that the Rwandan diaspora in general is pro kagami when that cannot be the case, right? so so on Facebook at least there's this sort of governmental boosterism of Showing the diaspora to be to be pro regime That's a finding, huh? It's not like you don't say then oh Facebook isn't accurate or or you know what I mean Like you don't say oh it doesn't represent the truth No What it does is it is it shows you who's dominating the discourse in the Rwandan diaspora on Facebook And then you can ask questions about the extent to which that that's a government. It's a government organized diaspora Right until you get into these more critical more interesting research questions. Yeah, so that's how I'd answer it Thank you for the presentation going further with this example on Facebook and also the last one on twitter Will you recommend to go back to these qualitative analysis with people in order to try to find the reasons of just for instance this twitter example Try to find the reasons why they left the platform or if was twitter the Who who deleted the the accounts of the of the mapping and the and the and this database you you have in the beginning Yeah, I think that's an interesting question. I would I would encourage one to do that However, we're talking about 35 percent of three million So we're you know, we're talking about about a million accounts So so this is always this kind of big data moment and you're like, hmm Okay, so should we be a social scientist and and do a sample? Well, yeah, okay Uh, should we How do we do that? Well, like what what's the next step? And of course, they're deleted. So there's a whole ethics in that, right? So, um, so if you delete your instagram accounts I don't know for some reason. I'm not you but one, right? So if like, um, so for example, we recently published a study on fake news in the netherlands For the ministry of internal affairs. So if you ever write a governmental study, uh, You you find yourself in this world that you've never been in before where every citizen Feels that they now can attack you on twitter. So we've been massively trolled, right? So like massively like or massively like Like within within 48 hours A thousand hateful tweets from the right And so, you know, so we're like, hmm, what do we do? Okay? Well, let's just turn it off for now You know, I'm big but collect it and see what what we can make of it later Um, but I could I could imagine if I weren't a new media researcher. I could imagine just Just just shutting that account down, right? Okay three years later Someone comes knocking on my door and says can I interview you about why you? Um, you know, maybe I would be okay with that or maybe I wouldn't you know, like there's some issues there Maybe it's fine. You know, maybe it's fine. Um, but it's like you can think of a bunch of scenarios um where like for example, um, where it's not it's not sort of proper to to Make it known that you know who they are, etc, etc the We we had this project with Where we studied the climate change skeptics on twitter and we wanted we and and we made a map We kind of I don't know. It was a it was a bipartite graph. It was like users and hashtags And then and then it was very interesting. So we we actually got in touch with the skeptics And said, you know, is it okay to publish this? No, uh, and Um, and a few of them were like no way. I don't want to be known as that You know, like as a skeptic, um Like or I don't agree with your terminology or whatever it was. I forget But at the same time, um, they on twitter were boasting That they got on the map. It was very weird Right, so it's so it's like they were happy on the one hand But they didn't want to be made public on the other hand. It's all of these kind of complex things but in any case, um I would encourage a well thought through I'm just giving you a couple of like Little anecdotes, but like a well thought through approach to You know qualitatively figuring out Where these people went to? I also have a lot of questions. Yeah And now my this question is very connected to the last one You remember 10 years ago Why your magazine published these famous articles about the end of science The end of theory because they say okay big data is here Forget linguistics forget traditional interpretative sciences and disciplines and so on. Well, you know why your magazine Every year they kill something the blogs the web the science so that they're Killer magazine. Um, so now it's it's To go deeper into this collaboration cooperation between quantitative and qualitative research You know, I told you before most of the people in Latin university, the latin. I mean specifically italian, french, spanish university, latin america We have more qualitative approach to immediate communication. I'm talking so How can we Integrate these different approaches because I think we we need to integrate them yeah, so So so the I mean the digital methods. I mean it it is quite social science ask But it's also in the humanities So that's why I kind of situated in both traditions And so well, I mean You know with social science, there's a question of what I mean Hey, let's start with humanities and humanities. It's the the goals are Like, you know criticism and interpretation You know and and and both of those are fully qualitative Um in the in the social sciences, I don't know if we want to boil it down to something inference and So, I mean that there are there are a couple of goals Ultimate goals as well So In both of those cases, there's there's the qualitative Is uh, it is actually more important than the quantity if you think about the The what the what the ultimate goals are so the the point being is that what digital methods tries to do and I wrote one article about this and it's It's a it's in the it's in the quantity quality tradition and I normally I reverse it and say it's quantity quality So it's mixed in that sense in itself so so it uses Quantitative techniques for Either in the humanities making a collection or in the social sciences making a sample or More generally creating a data set through Through careful consideration, which is quite qualitative of query design And I'll go into this in detail on the workshop. So how do you create a query? In order to make a data set that then will answer research questions And so a lot of the approach more generally is that I refer to I call it search as research And it's quantitative, but it's qualitatively informed And so it's quantitative in the sense that you're grabbing data and making a collection But you're grabbing the data in in ways that You where you seek to answer a particular research question through a particular kind of query design So how do you query? How do you query a database? And so all of these all of these platforms are databases or it's or maybe maybe the more Contemporary way to talk about this is instead of saying I'm querying a database and saying I'm curating a feed Right. And so when you're curating that's already that's a very qualitative Thing to say and then the feed is the quantitative side So you're always doing both. I like to do technical field work So so first so figure out the the research affordances of these apis As I said before think about what kind of objects are there what kind of fields And then how they're normally used by the platform and then think about how to remix repurpose And for for research And then derive your research questions that way instead of coming to it with a with a with a With an analytical framework that's already in place. So that's more agile It's more platforms Specific it's more medium-centric I realize that But it's also very qualitative also at its core. Yeah Yeah, yeah, yeah it's more you know that Chinese platforms are arriving tick tock Is it complicated or at the opening? I don't know if happy work with tick tock is it's the same as working with Western platform, let's say or they they have a different protocols or Or maybe the data is not so available Yeah, so I don't know about tick tock So I haven't looked into that but But what I did do quite extensively recently is not study the Chinese yet I'll talk about the Chinese in a second But but I studied the migration as I mentioned to you of the extremists who have been de-platformed to this to this alternative social media ecology and what's interesting about that Um as a number of things, but one of the things is is that they're they're very open So if you follow the medium and this was one of the kind of slogans of digital methods You um There there are some opportunity research opportunities available That is not to say that we would we should stop the critical study of facebook and instagram because they cut off their api No, that's not right. So it's not just blindly follow the medium. We also have to push back But but so the so the so because researchers are also being de-platformed in some ways In a very different way, but in a way Okay, and then the chinese um So so I mean basically I mean, there's the there's the smaller ones or well now tick tock has become larger But I mean the the large ones are reached out in weibo and and and both of those um are Have Also over the years sort of become more and more difficult to study Um, and there are a couple of techniques that I mean colleagues at the chinese university of hong kong in particular Have developed uh, well the the hong kong researchers in particular have developed a series of techniques Which you can use they're open um In a software to to to scrape and and so and then the questions you ask I mean, it's it's it's kind of difficult in in here. We could ask far more critical questions than they're Asking it's very complicated. You know So the the the like I would ask to what extent does the regime's message Resonate or to what extent are there alternative voices? They can't answer and ask those questions because Because there's a split also in hong kong But they do have the the the software and and and and it is available I mean the second thing is is that I don't know what the situation is in spain, but in a lot of european country, certainly the uk there's a huge influx of chinese students Into into like digital media courses. We don't it's not so much in amsterdam We have like, you know five percent, but in like kings college london in the digital humanities program It's like 80 percent. It's like 80 percent in the uk and kings college. I mean it's And so then then you're kind of, you know So what should you call your new like what would I call this? I guess I would call it western digital methods, right because it has nothing to do and because they're always like yeah, but but You know if you want to because we do comparative So like like the same query in different google regions Right so and and to see where the sources come from. So for example, if you query amazonia in all the south american Google regions in google in the advanced search you get Some sources from south america, but you mainly get sources from spain So it's like I know this is that's very interesting. So it's like this kind of neocolonial search engine suddenly right so you could it's very And and they're like, yeah, but we can't you know for jenna, you know, it's by do and We have a by do scraper, but it's a very different kind of thing You know, so it's it's not symmetrical the the studies or it's not this So you want to you need to pose different questions in some sense. You need to redo the curriculum You know, so if I had 50 or more I would I would have a course a dedicated course You know that that that is on that that stuff and then another course probably that's a that's a cross You know that compares the two or So And then and then ultimate and then and then Tick-tock, you know, if if we if we take seriously the periodization that I put forward for the mainstream social media platform So going from the going from the frivolous and the playful to the issue related to To the study of fake basically to the dark side. Yeah Um I mean, you know, if if if it holds for those four it might hold I mean, I'm not speculating or predicting but it could hold for tick tock So it probably makes sense to start building our scrapers now First I would ask something that you were talking about how google Have a kind of a colonization about how you search and how you find your data So why don't we start looking to dug the goal because I encounter problems now that I was trying to search Something in Brazil and all the results was related to here and this really bothered me This is a the question, but I have another one that is How we look at temporalities in big data Because for instance, we were like mining data with net viz before the They shut down the the api And now we it is it's a problem between me and and pedro because we are we are doing this research by ourselves It's something completely independent of our current works Like if the data will get old Because we don't have time so Like temporalities in big data. Yeah So in terms of in terms of alternatives to to google so So they're so they're only So what's interesting is this there's not a general alternative. There's only alternatives on specific features so so duck.go Advertises themselves as an alternative for privacy and what was it called frame of software? There's a there are a couple of european projects that that Advertise themselves as holding data on european servers, right? So these are the two features that currently Folks are using to try to compete with google, but google has become a mass media now So the barriers of entry to that market are so high that you can't get in It's it's you know, it's like this idea that that's you know Someone's developing a search engine in their garage or something that it's going to beat google It's like don't think so, but but on specific features, right european servers only or privacy only So those are reasons to switch but the The second question about Sort of longitudinal studies Using so digital methods was devised as a solution to the ephemerality problem to the instability So this is why this is part of the the follow the medium that sort of the agility right so so The idea that you look for research opportunities rather than Not right rather than coming at it with you. So so the whole philosophy was built around this So now it's challenged right because we have this facebook moment. What do we do about that? So we're doing three things Or four things One is we're building a scraper We have to And we should have that up I don't know january or something But you don't get the same stuff from a scraper as you get from the api Because the api is Is the developer's mode is the back end and the scraper is the front end right When you scrape your screen scraping So you're seeing what the user sees not what the developer sees so it's it's a different right So do you then say oh the data aren't commensurable? Let's stop or do you say hey What's the difference? I think I would do the ladder, you know like So that's one thing and then you can develop critique of of relying only on back ends and apis right because Because you could say that what the user sees is far more significant than what the developer sees Because what the user sees is is an actual feed. It's actually it's an actually existing user So you're you know, you have to be logged in So you get the personalized stuff whereas Whereas with the back end it's not personalized right so you get these engagement counts where you say This is the most engaged with post and it is the most engaged with post but 20% of the users didn't see it even though it's most engaged with for various reasons or we don't even know what percentage Okay, so this is one thing which is interesting so make it into an interesting problem As opposed to something that's that's so problematic. You can continue the the other one is It's like manual small data approaches. We can talk about that another one. Well, one example is this counter archiving facebook project that's that's being done at the open university of In jerusalem In televief. Sorry the open university of israel. I forget what the how that ends open university is how it begins And and so this is these are very time specific. These are like event based elections based, right? So you're You're you're you're bounded temporarily So so your your research because of the ephemerality of the medium means bounded temporality That's that is that's doing business with the mid the medium And And then then the last one is of course What I guess is something that just like always abides and that's the ethnographic Um, and so developing sort of like a robust so people will say oh, you know, we're doing a digital ethnography Uh, and then they don't really explain what that means. I mean I you know, um, so so what's a digital ethnography? Um, uh with an api and without an api with a scraper without a scraper Uh, and and and what are these sort of advantages disadvantages? So so I would then dig into the the, you know, the ethnography or whatever the terms are as a as a as an alternative these days There's also the data journalism side to it. There's also media monitoring And if you if you want to do the If you still want to get to the the data side, you can use marketing tools I mean, I use buzz sumo all the time. So I use crowd tango So that gives you which URLs Not facebook pages But web URLs Are most engaged with on facebook You get that data And through crowd tango and now crowd tango Some sort of premium services available through the facebook social science one project I'm not a fan of social science one. I can go on and on with critical remarks Um, but if they do indeed as my colleagues told me that they did very recently actually roll out a version of a kind of advanced version of crowd tango Use it Thank you for the for your presentation. I'm sorry for my voice and I'm I'm going to make a question about my thesis I'm working in activists environmental activists And I am a student in online and offline in the online I'm working with a data set that comes from twitter from a hashtag that activists use And and dealing with this worry, what about boats boats, you know And the second question will be um, what about Some people is questioning about technological determinants and Maybe it could be One of my worries too So how do you deal with that? I come from social science too. So It's a question that I have in my in my thesis. Thank you Um I don't know really what to say about the technological determinism points But I will say some I mean, maybe you can say a couple more words what you mean by that issue, but The on the on the bots side So on twitter, so, I mean, there have been a couple of interesting studies Uh on the brazilian elections, for example, um on the sort of the presence of bots and There was a higher incidence there than normal and normal is like 10 percent Uh, but in in that in the in the brazilian elections in a in a couple of for a couple around a couple of can around one candidate in particular Um, I guess you can guess which one Um, there was a there was a there was a question of artificial amplification um, so I guess I guess I would say two things first of all, um If you if you dig into the study of bots, you'll find This range of approaches from fully quantitative to fully qualitative and and it's very interesting literature And uh, and I and I think like the most the most significant bot detection Moments were for accounts that that were really well done Like they didn't seem like bots and and they they evaded the automated detection Um, so I think that there's a major role there for qualitative research But you need to first use a quantitative Have to you don't have to but it's probably a good idea to first because you're dealing with large datasets to use a quantitative method uh, whereby you say if Um, if a tweet or of an account Um has at least four of these signals Out of 12 then it might be a bot then it's suspicious or it's flagged And then you go into it qualitatively Um, so I think that that's a way to to study bots. It's the two-step process instead of trying to sort of have these pure fantasy, you know pure automated fantasies um But I think what you mean by the techno determinism point is is that is that So normally that's a critique, right? But you're meaning that maybe that the technology is determining outcomes of the elections No, or something. I don't know what you mean by that but um, but there is um What I find interesting about this debate is that the um It's also the same with in a different way with the personalization debate on google The the number of results that are personalized the number of significant tweets that are come from bots Are actually quite lower Than people imagine So with the personalization, it's like 10 percent and with bots It's like 10 percent So it's it's a lot lower, but then when it goes above that level That's when you say aha. That's when you you've made a significant finding. So when when um When the like engaged with like retweeted like significance or retweeted tweets um The most significant retweeted tweets the ones with the highest engagement if you get Those coming from more than 10 percent of bots That's Then you have your Techno determinism possibly because because then then there's quite major amplification coming from bots we have the brazilian bots bot zonados um More questions here More question from brazil the brazilian blog is asking many questions today Hello from portugal, but same language. Yes I have a question and one is Some like a commentary. I would like to hear your thoughts about it. The other may be harder So I I apologize if you think it's unfair, but my first question is now we are seeing that social media Responsibles are being held into They are being held accountable There are there's a talks about whether or not the platforms are responsible for the content and there are different positions I would first as a commentator and an observer I would like to hear you think What are your thoughts on the issue should social media Platforms be responsible or not for what users post in there and how does that relate to Moderation and second if you see a way of uh, if you envision a way of researching this A kind of a empirical way to approach this question and to because I I've been thinking and I'm This is the hard question. So don't don't please don't feel obliged to find Very definitive answer, but I'm curious about this. Is there a way that as researchers we can help this debate go forward Yeah, that's an interesting question. Thank you for that. Um, so generally speaking, um like like And I'll I'll also talk about this briefly in the workshop if you're coming That is that if you look at the history of the study of the web as different sorts of spaces Um, and I mentioned previously cyberspace and you know, but I gave you four periods, but we could also Um compress that and and talk about I don't know cloud space social space Locative or geo space Uh Blogosphere like all these different ideas Most recently, I think we're in the comment space period. Um, and um, and it is Along with the common space period has been you know, this this sort of this Lowering of the threshold of inhibition and and the rise of toxicity and and interesting concepts that have been Developed, right? So so so people talk about well, there's hate speech, but online they use the term well actually You know because it's ironic and it's sincere and it's a jokey You might we might need a different term Um, so some people are using the term extreme speech some are using the term toxic speech, etc Anyway, so in this larger context, I think the the the question of um, the study of moderation And content review and so all of the platforms have their own terms for this, right? And none of the terms are editor, right? Okay, so exactly what you're saying. So that's not there so that they continue to try to be intermediaries In a litorian sense and also in a media regulatory sense So like channels like a telephone conversation where nothing it's unfiltered going through right So what you say on one end is hurt on the other end That's what the platforms want you to believe that they are but in fact They are Mediators in the litorian sense. So the content is transformed And it's transformed in a number of ways So I think to begin to answer your your question of the way forward I think the what we need to do is And I would if it were me and I were doing this I would do it again Because I'm really platform centric So so like twitter facebook youtube youtube just came out with a they had this big announcement Two days ago Now I've forgotten the term so they have another term. So they all have different terms and they all have different specific You know guidelines that they all have for the study of what constitutes whatever kind of content, right? So some of it's offensive some of it's called organized hate They de-platform facebook de-platform and instagram de-platform dangerous individuals. So all of these cusp terms, right? So it's no longer it's not pornography. It's not violence pornography hate. No It's moved the slider has moved over here So it's this portion of the spectrum of content that that should be studied I would argue and then I would do it in at least three ways. I would do it number one The the the the technicity of the determination That's number one. So so the combination of the automated Fully automated semi-automated and non-automated number two is I would study the guides the actual manuals That that are given to these content reviewers, right and they're so interesting I mean, I don't know if you've seen the janitors or the whatever this movie is called the cleaners You know where they have to decide and in milliseconds, you know, and it's just it's an ab right So ab testing so accept or don't accept, right? But but the guidelines and the conflict the conflicts in those guidelines are so interesting, you know, like this facebook is just it's it's so detailed And the learning thereof and so most of the commentary Focuses on, you know, the psychola the psychological damage of looking at all this content. Yeah, I get that I get that but there's also The the the making of the there's also the creation of all these distinctions I mean, I think that that's fascinating. I would study that And then thirdly, I would make collections Right, so I would I've talked about this before so I would make I would systematically and then these collections You don't have to sort of become a web archivist overnight or something, you know I would develop crawlers and all this jazz But I would definitely make at least url lists and and maybe url URL lists with screenshots. I mean we could go up. We can climb the ladder, right of how far we want to go But I mean we just built a screenshot generator for example and so you can you know, you Put in a list of URLs and you set the time in between how long you want the page to wait to load and then Right, and so you can create a collection automated. You can run this and create collections And and so, you know, and then then you know the phd or the postdoc So your researchers will come up to you and say, oh, did you get the metadata with that? No, sorry You know, so there's more you can do than that, you know that that screenshot You know you want more than that probably Um, did you get some teasers? You know the description text in google like when you made these URLs like did you grab that? No, sorry, I forgot to grab that, you know So you can think you know, maybe that needs a lot of careful consideration what you what you grab But you need to make collections Or some form so then you can check what's gone So those three projects Hi Richard and thank you for your presentation that unfortunately I was not here. So I lost presentation. So, uh, first, sorry for not being here Uh, um, I would like to take advantage of the fact that you are here Uh, along with Carlos Colari because, uh, I'm probably you you were talking in the presentation One of the trends in the last 10 years in analyzing social media Is the use of some concepts of or metaphors like the echo chambers Filter bubbles and so on That were used at the very beginning as a good way of describing what was happening in social media in particular for political communication But that have been criticized in the last two or three years in particular Axel Bruins our clique Axel Bruins just wrote a book a year ago about that. Yeah, and my question is uh Taking into account that Carlos Colari for instance He is used to use a really powerful metaphor on the media system like the ecological metaphor ecology Which gives a lot of information about what is happening in media really the spices and all the Things that can be related to this metaphor We're using what I would consider and I would like to know what's your opinion a really poor metaphors to explain really complex processes that are happening in the social media ecosystem echo chambers and filter bubbles in particular And I'd like to know what is your opinion about that Do you think that we have to develop or to use more powerful more powerful metaphors to explain what's happening? That's great Because I was thinking about the same because you are talking your next next article would be about deep and platform and I have also worked in mediatization theories now. We are talking about deep mediatizations Andrea have so We should also reflect on that no because if deep that means it's a 3d environment. We are going deeper and deeper so this use in in two different Very close fields mediatization mediatization theories and digital method. We are talking about this going deep But this is we are in the same dimension the language and the metaphor we use because they're very important when Because the metaphor models the way you are doing the research Yeah Well, I mean yes, so thank you So We've developed two terms Which we think that they haven't been really taken up so much, but we still like them One is device culture and the other one's ranking culture And both of those ideas Take into a well, they're both the study of the outputs Of social media, right? So there's so it's in some ways a study of the feeds And the study of the feeds can be studied from an echo chamber a filter bubble point of view Or you can begin to develop other terminology So the terminology that we've developed is device culture and ranking culture And ranking culture was developed by bernard reader and device culture I developed with a couple of colleagues and what and they're very similar But what they try to capture is this idea That The outputs are iterative and recursive right and so First and foremost that one One point so so it's the they're taking into account so outputs of social media in feeds, etc recommendations They're taking into account What you're doing as well as what your environment is doing Yeah, uh and and uh and outputting Accordingly the second thing is and this is where the deep Point comes in So I think you might want to add or we should add. I don't know if deep's the right word probably The word deep to it so deep ranking culture deep device culture because there's there's a deeper back end in play And that's and that's the commercial side and the advertising side Which which some which the filter bubble doesn't take into account So the so or or it does but but it could take it into account more explicitly So the filter bubble as you know was developed by elie pariser to describe google right? So this is 2009 december 2009 google flick the switch No longer did we get universal results? But we got personalized results and then his in his in his famous ted talk He showed the query for egypt and one friend got the nile river and the other friend got Egyptian revolution in the images right and he's like see filter bubble And so then the dangers of the fields above, etc. You know, no no this this story. I think it's been applied Not not not inaccurate not um I think it the way it's been applied to the study of facebook is not bad I don't think I mean like the the famous piece was in the wall street journal Around the 2016 us elections it ran until august of this year was called red feed blue feed or blue feed red feed and they emulated feeds Of those who would be on the left of the political spectrum on the right according to the kinds of media that they were likely to consume Which were on the basis of independent study of the types of media consumed by the left and the right in the us And so the emulated feeds and then they thereby showed Bubbles right or echo chambers and I think that that was well done But the thing is is that that was there was not a real case Right, so this is like again relying only on the back end of the api. I mean they weren't doing that But I think to I think to appetite this one needs to You know grab the grab the front end Actually, you know scrape the the outputs There are a couple of researchers at northeastern or particularly good at this studying the facebook news feed at the moment I was just with christian sanvig Last week in Copenhagen who who's a big proponent of algorithmic auditing Who developed that term who's actually suing the us government? I don't know if you've heard about this is very interesting Preemptively suing so that when you when you save when you break the terms of service of all these mainstream platforms And you save the outputs of algorithms in order to not study trade secrets But study algorithmic bias and the power of algorithms and machine bias and this sort of thing discrimination That that you can do that legally as researchers so that this is but anyway, I think I mean I I'm I'm one of these people who I'm pro scraping So I break the terms of service all the time So I think that that's the way to go In order to in order to study it and I think But at the same time, I think that one does need also to study The you know Quote-unquote the back end not the secret sauce not the out not not we don't need to read the code But we do need to understand The the logics of Advertising behind the recommendation So you get you get the front end filter bubble plus the back end That's how I would go about it and we could call it deep use bernard's term Bennett readers term deep ranking cultures as whatever This man has to eat something before the workshop. So we have to stop And the the workshop is in for the people who has Entire a la sala de graus La tercera planta in tanger And Well, thank you very much And Thanks for sharing your research and knowledge with all these people young people Many of them are starting the phd in in these weeks. So well, they have a lot of inputs for For the future If you were starting your phd, would you learn a program language? And if so, which would you choose? Oh If you're tech if you're technically inclined I would dabble and python and r. Yeah, definitely And if not, then Then don't worry. That's the other part of it. Don't worry Don't worry about that. Yeah