 This talk will be held by Damon McCoy. He will be explaining online US political advertising and he has been working with researching like how different online communities basically behave around many different topics, but this is what he's going to talk about today. So please give him a great round of applause. Thank you everyone for coming. I'm up here speaking because I'm the only one that wanted to fly to Germany over Christmas and New Year's. However, there were three real people that were really key in helping out with this research. And before I get started, I just want to credit them. One is my grad student, Laura Edelson. She did a lot of the analysis that you're going to be seeing generated all the graphs. One of the undergraduate students is from our NYU Shanghai campus. Secure did a lot of the work to collect all the data that you're going to see here. And then Ray Tan, who is a professor at NYU in the Shanghai campus, also helped out with kind of our initial efforts of collecting some of this data. And so before we get started, I guess I'll give a little bit of an introduction about myself. I'm a professor at NYU Tandon School of Engineering. As was mentioned before, I do a lot of stuff kind of looking at how technology kind of impacts the security and privacy of society, groups of people, and things like that. And so this was really kind of an opportunistic project that kind of captured the impact of, you know, online advertising in the political sphere of U.S. campaigns. And also a quick plug. So everything that I'm going to be showing you, most of the data, scripts and things like that, we've put in a GitHub that's accessible by anyone that wants to analyze the data or look at our scripts and improve them or things like that. Thank you. This is the first time that I've given this talk outside of the U.S. So let me just start with some quick explanation as to how U.S. elections work for those of you that might not know about this. So every two years in the U.S., we hold federal elections. These are elections, right, that impact all of the states within the U.S. And so every four years we have an election for presidents. 2018 were our last elections. This was not a presidential election year. So the elections were for the Senate and the House seats at the federal level. And then we also had elections for state and local positions as well within the year. And some of those are captured in our data, especially our Facebook data, not so much in our Twitter and Google ad transparency data that we have here. So this will be focused this talk on the 2008 elections that happened on November 7th. Election Day is always the first Tuesday in November in the U.S. every two years. So to begin with in the background, right, probably some of you know about this, some of you might not know about this. But in the 2016 elections, which were a presidential election year, there was, right, this election interference that happened here. And so Facebook has released these ads. These ads were paid for by a Russian company, the Internet Research Agency that ran these ads. And Facebook released these to write the Senate and then the Senate re-released these publicly to people. And so this is an ad basically trying to disenfranchise people from voting in the elections. And you can see, right, it's targeted at people in the U.S. of a certain age range and interests like Martin Luther King, African-American culture, African-American civil rights. Facebook doesn't actually allow you to directly target to people based on, like, their ethnicity. So this is a pretty good proxy, though, if you target these kinds of interests for this. So this would probably be fairly effective at targeting African-American people within the U.S. for this ad to try and disenfranchise them from voting in the elections. There were other ads, right, that tried to do misinformation, disinformation kinds of campaigns. So this was an ad that was, again, paid for by this Russian agency that was trying to perpetuate this rumor, basically unsubstantiated rumor that Bill Clinton has this illegitimate child within here. And again, right, the targeting information is targeted at African-Americans within the U.S. African-Americans are kind of a key voting block for kind of more liberal democratic people within the U.S., oftentimes. That's probably the other thing I should have explained about the U.S. election system, especially at the federal level, is that we effectively have two parties that are, you know, win any meaningful amount of elections within the U.S. And one of them is the Democratic Party that tends to skew more liberal, so they're more kind of, right, for bigger government, more social services. And then we have the Republicans, which skew kind of more conservative, wanting kind of a smaller government, providing less services and kind of less regulation around things as well. And so, right, these are two examples, but there were a whole bunch of these ads that were shown on Facebook within here. So, right, pretty much all of these either tried to disenfranchise people or tried to kind of create chaos, kind of polarize people around the election, oftentimes with kind of disinformation sorts of things. And so in 2017, our office of the Directorate of National Intelligence put out a report, sorry for the big blocks of text. This will be the only big blocks of text in here, but I thought it was kind of important to show this because, right, they pretty much unequivocally state that, right, Russia tried to interfere in the U.S. elections, and that, right, Vladimir Putin was somehow involved within this interference. And so this is pretty much as far as the National Securrency Agency, the CIA, NSA, pretty much, right, solid evidence that they have that this occurred within here. And so the other thing that broke was, right, the Cambridge Analytica scandal as well broke within Facebook where there was this third party advertising agency that collected, you know, a whole bunch of data on 80 million profiles within Facebook and then tried to create psychological profiles for targeting and messaging and things like that around here. And so these two particular scandals broke within here, and, right, the first result of this is we have Mark Zuckerberg in a suit, a real suit, not a hoodie suit, testifying, right, in front of our Senate within here. And so he, right, he testified before House and Senate committees about the abuses occurring within Facebook. And he did this on April 10th and 11th, 2018 within here, right. So this is, right, in here he did admit that, right, Facebook had made mistakes and that they need to improve things moving forward within their platform. The most kind of, right, tangible outcome from these testimonies were these transparency archives that began to appear. And here's a view of what Facebook's ad transparency archive looks at. When it originally deployed, you needed a Facebook account to interact with it. Now that Facebook has dropped that requirement, so anyone with Internet access, unless, right, you're censored somehow, can go to this archive and access these ads. So the user-facing portal for these ad transparency archives, you type in keywords, and then it basically does a pattern matching on the ad text in other parts of the ad and then returns the ads that match that within here. And so then you can see all the political ads that match that particular term within here. Facebook began archiving these ads kind of at a large scale, starting on May 7th, 2018. By election day, November 7th, 2018, there were 1.6 million ads paid for by over 85,000 advertisers within Facebook's platform. Facebook was actually fairly broad as to what they included within their political archive. They included any ads related to U.S. elections, either federal, state, or local elections. They also included these very important kind of issue ads. As we saw when we looked at the Russian interference, a lot of times, right, the ads didn't mention actual political candidates, they just mentioned kind of polarizing issues within the U.S. So Facebook also included these ads of political and national importance. They had a list of, I think, about 13 different criterias, the last of them being values. And so it was a fairly encompassing set of ads that they tried to include within their archive. Along with the text and images or videos of the ad, they also included basically ranges of geographic impressions and demographic impressions. So right, state level impression information in some kind of ranges and demographic by gender and by age kind of bucketed within here. And they did this, again, for impressions. And then they also included some spend information, again, kind of in ranges. So they gave ranges of, you know, $0 to $99, $100 to say, like, $500, $501, $2,000, and so on. And so forth within these buckets. One of the key pieces of information that they did not release was the targeting information. So like I showed you before, for those ads, right, they have that targeting information. Facebook does not release that within their transparency archive. They have this, right, they had that user portal where you could do the keyword search from within there. However, right, I like to do large-scale data analysis. And so I wanted to basically try and collect all the ads within this web portal. And so initially all they had was this keyword search portal within here. And so what we did is we compiled kind of a large list of what we thought were reasonable set of keywords, you know, names of prominent politicians, names of states, issues within here. We tried to compile this long list of keyword searches and we began scraping their portal within here. And I'll tell you the story of how our scraping efforts went. Now currently they also offer a API, it's still keyword-based, their API, and it's restricted by an NDA. So I'll kind of flesh out the story of how this goes. So at the beginning, they released this kind of, towards the end of May, the user archive. And I played with it and I realized that this didn't lend itself well to kind of large-scale analysis of these ads. And so I went to my students, Sikur and Laura, and Sikur worked kind of furiously night and day, and within three days he had a workable scraper that was able to put in our keywords and then we were able to scrape all the results from our keyword within here. And so we ran the scraper for about two months and then we released a report, just kind of a very general statistical report, and we released the data in our GitHub archive at that. After that, about two weeks later, Facebook began anti-scraping measures within here. And so this kind of hampered our efforts to scrape Facebook's archive at this point. I don't want to attribute any malice. I don't believe that Facebook was targeting just our scraping efforts. They were targeting everyone's scraping efforts of the transparency, whether it's wrong or right to block people from collecting data. On a transparency archive, I might kind of quibble with them on that and say they might want to provide better access to the data within their transparency archive, but this was the choice that Facebook made to kind of clamp down on the scraping within here. So we tried to fight with them a little bit to write kind of a canned mouse game. We make some changes to our scraper to avoid their anti-scraping. They do some things on their end to block our scraper and probably other people's scrapers that are doing similar things to us as well within here. And so this persisted for probably about two weeks, and then Facebook basically deployed their API within here. However, as I said, their API is very limited and still in beta at this point. So these were part of the terms and conditions from here. One of the ones that I found kind of the most onrius is that it limited it only to U.S. people. So we could essentially only very closely work with U.S. people within here, and it limited the types of people that we could work with in here. Unfortunately, this kind of ruled us out from working closely with journalists from really good news organizations like The Guardian, that just happened to have the misfortune of being located somewhere outside of the U.S. within here. It may be the good fortune, yes. And then the list of restrictions continue. They also placed a data retention on it so we could only retain the data for one year. Again, placing data retention. So Facebook's data retention on their archive is seven years within here, but they're placing a one-year data retention on the data that we collect from their NDA. I'd like to say that I got this NDA and I lit it on fire, I tore it up, and we continued to scrape their archive within here. Unfortunately, it was a hard call to make, but there's basically two students, and we basically had to make a call. Whether we wanted the data to analyze or whether we wanted to spend all of our time kind of fighting with Facebook's anti-scraping efforts. And so in the end, I did in fact agree to their NDA within here. So the initial data that we scraped, we release, we're still scraping a small amount of data that we do release as well from here, but unfortunately at this point any of the data that we collected from the NDA, we cannot release within here. If anyone does want to fight with Facebook and resurrect the crawlers within here, I would be more than happy for that to happen within here. Unfortunately, given our engineering constraints, it just simply wasn't feasible for us to do that within here. And so the story is a little bit different with Google. So Google's archive, they began archiving ads on May 31st, 2018 and by election day they had 45,000 ads from 600 advertisers. Their criteria for introducing advertising was much more narrow than Facebook's. So they only released ads related to U.S. federal candidates and federal office holders within here. So there's a much more limited set of data that Google released within here. None of the issue ads that Facebook released. They didn't release any of the geographic or demographic data by impression. They did release ranges of impressions and ranges of spend data, and they did release some limited targeting data from here. So they released geographic and demographic targeting information, which Facebook hadn't released in their ads. And their data is available through a similar keyword-based portal, but they also make it available through just a database if you want to within here. So this is what their portal looks like within here, and this is their big query database that they released from here. And so they update it every week within here, and you can download it and analyze the data relatively easily within here. So the last one to kind of implement their archive was Twitter. Twitter began archiving ads on June 27th, 2018. The scale of ads in Twitter is very small compared to the rest of them. The scale of their ad network in general is much smaller than Google and Facebook's, and what they included was similar to what Google included in terms of right only federal candidates within here. Kind of closer to the election, they also said that they were going to release political issue ads. However, the mechanism of enforcement doesn't appear to exist within Twitter's system. There doesn't appear to be anyone's job it is to actually enforce transparency of ads from here. So we've been kind of manually finding accounts and reporting them to Twitter within here, and then when we manually report them to Twitter, Twitter then includes them in future transparency kinds of efforts within here, but it appears like we're basically the ones. It's become our job to monitor the Twitter accounts and then notify Twitter, and then they'll manually kind of deal with it. Unfortunately, they still don't appear to have a person that actually manages this process internal to Twitter at this point. Twitter does, however, release the most information. So they release exact data, not the range data, on impressions and spend information, also by geographic and demographics, and they also include all of the targeting information as far as we can tell, and their data is available through, without an account, basically through their portal, and we've been scraping them, and there's been no problems. They haven't blocked us at this point, so we just simply scrape their data and then we republish it to our GitHub at that point, and we've had no problems with Twitter in this way, and the scale of their data is so small that it's been relatively easy to keep pace with it at this point. And here's just a picture of the Twitter transparency archive, and again, they just have a list of all the Twitter accounts that they've included in their transparency archive, so we can monitor this and then we can monitor other people that we know that are politically active. When we see them doing paid advertising, then we can notify Twitter and Twitter will include them in their transparency archive normally within like a week or so of here. And so this is kind of the background that you need to understand these transparency archives. So now we have a data set that we can begin to analyze within here. For Facebook, since it was the keyword-driven thing at the beginning, and it still is, we were able to collect about 80% of the ads in Twitter's database from there. The other problem with the API is that it's severely rate-limited at this point. I'm talking like about three to four queries per minute that we can get through Facebook's API at this point. And so we kind of did our best effort to collect as much data as we could from Facebook about two weeks before the election, Facebook began releasing a transparency archive that included basically an aggregated list of all the advertisers and how many ads they have and how much spent, and this is how we can tell that we got about 80% of the ads from Facebook's archive based on this within here. And the nice thing about the transparency report is that we could go back and now that we know what we're missing, we could readjust our usage of the API. So now we have virtually 100% coverage of Facebook going forward within here. Twitter, right, we could collect 100% of their data. And again, we've republished this all in an easier to process kind of form. Google, again, right, we have 100% of their ads because they're all in the big query database. However, when we started analyzing the data, we noticed that for a lot of the ads we're missing the actual content, the images and text of the ad. It turns out that for Google's ad network, if the ad was originally purchased through a third-party advertiser and then run on one of Google's properties, the content of the ad won't be archived within here. So this is unfortunately a big loophole. So, right, if you're running a kind of malicious misinformation thing, you can easily, unfortunately, circumvent Google's archive, at least from archiving your content by simply just paying for it by a third party within here. It's unclear whether this is a policy limitation or whether this is a technical limitation on Google's part, but the outcome is that we only have the content for about 70% of Google's ads that were paid for directly on Google's platform within here. So one of the first things that we wanted to do was kind of add some semantic meaning to these ads at kind of large scale. And so we played around with a few techniques, you know, some fancy kinds of natural language processing and things like that, but we found that there's actually a really fairly simple and effective way of categorizing kind of the intent of the ad. And that's that most of these ads have a URL of some kind. And a lot of these URLs just point back to, like, third party services. Like, if you're holding some kind of event, you're going to coordinate it with, like, Everbright or something like that. If you're seeking donations, if you're a Democrat, you're going to use this third party payment processor called AcBlue. If you're a Republican, there's like two or three payment processors that you're going to use for this. So we could simply just look at these really prominent URLs that occur a lot of times and just kind of manually tag. What's the purpose for this? And by doing this, we can tag ads as either just purely informational, that they want to just kind of get some kind of message about the candidate, positive or negative, out there, connection ads that are seeking contact information, like people's email addresses, phone numbers, names and things like that, presumably so that they can either get them to volunteer or donate money in the future for the campaign. There's move ads that are either trying to get people to vote or to attend some kind of rally or to volunteer or something like that. And then, right, there's donation ads. And then finally, there's kind of commercial ads. These are things, either they're selling products that are kind of directly political in nature, you know, like a bobblehead of some candidate, or they might be like solar panels, which have tax credits in the U.S. and things like that. So there's some kind of commercial good that's linked somehow to some political messaging within here. So we use this method and we were able to categorize about 70% of the ads. We took a random sample of them. We manually checked what we're doing and we found it was pretty accurate, about 96% accuracy we got using this method. The other thing that we did is for the top advertisers, so for Facebook, the top 75% of the advertisers, for Google, the top 80% of the advertisers in terms of the money spent by the advertiser, we went in and we manually categorized what was this type of organization. Was it a political candidate? Was it what's called a political action committee? So these are the PACs. Within the U.S. was it a union? Was it a for-profit operation? Was it a non-profit operation? So on and so forth. And so we wrote some regular expressions that got us most of the way there. Most of them have fairly uniform naming conventions and for the ones that we couldn't kind of automatically classify, we just did it manually within here. And then since Twitter adds so few advertisers, we just did these all manually within here. Now right, we can start to do some analysis. So the first analysis that we did, the easiest analysis, was we looked at the size of the ads. And so the thing that pops out is that the majority of ads on all the platforms are between $0 and $100. So these are what are normally called the micro-targeted ads that are typically seen by less than 1,000 people within here. So these are very short-lived, narrowly targeted ads that are kind of honing in on a specific demographic within here. So these are these micro-targeted ads within here. And it appears right that the majority of ads, especially on Facebook's platform, 82% of them are of this micro-targeted kind of ilk within here. So this kind of confirms the reporting that people had of this kind of trend of micro-targeting within political advertising. The other thing, right, based on our categorization, we can look at how the different platforms were used from within here. The problem with these numbers is that there was different inclusion criteria within each of these databases. And then, finally, we can kind of look at the different types of advertisers on these kinds of platforms. And again, it's hard to read too much into these numbers because, again, Facebook included much more of the commercial stuff, so we're going to see a lot more of the commercial stuff within here. And the final analysis of the entire dataset that we did was looking at kind of basically the ramp-up to the election. We cut this off in late October. This analysis was done for a paper. So the due date of the paper was, ironically, November 6th within here. So we cut it off a few weeks later, and we haven't regenerated the content since then. The one thing that you can see is at the top there is that green spike. That's kind of the move ads. So, right, closer into the election, the campaigns were kind of doing sophisticated, you know, get-out-the-vote kinds of ads within here. So there were really sophisticated kind of micro-targeted ads that get-out-the-vote, where, like, it was almost kind of spooky, where, like, they knew, you know, where the person lived that they were targeting, and so they gave them, like, directions on how to get, you know, from where they lived to their nearest polling place with near. So there were these really sophisticated kind of get-out-the-vote efforts that were being run online within here towards the end of the campaign. To kind of give you more of a kind of apples-to-apples comparison of these different ad platforms, we also did some analysis, kind of narrowing each of the different advertiser types to the ones that were made transparent by all three platforms, which were the federal candidates only. And so this can give you some idea of kind of scale of these things. And we can see, right, that when we narrow it here, we can still see that Facebook has a lot more advertisers and a lot more ads compared to Google. However, the spending numbers are kind of comparable here. For Facebook, impressions and spends are ranges, because that's all that Facebook releases. For Google, the impression data is ranges. However, we can get an exact spend data, because Google basically released a weekly report of exact spend numbers aggregated by the different advertisers within here. So we can use that to get an exact number of the spend. And again, Twitter's numbers are much smaller in terms of everything within here. And we redid some of our analysis to just see whether our effects were simply a distortion based on what was included in the archives. So we redid our ad size analysis, and even when we limited it to federal candidates, we can see that this still holds that a lot of the ads on Facebook are these micro-targeted ads, and they're still micro-targeted ads on the other platforms as well within here. And this micro-targeting, of course, varies depending on the advertiser. So you take someone like President Trump and he does a lot of micro-targeting. So almost all of his ads, probably about 90, 95% of his ads are micro-targeted within here. You look at other candidates and they do much less micro-targeting within here. So this is definitely different strategies are used by different advertisers within here. But when we look at aggregate, it still appears that micro-targeting is a very popular strategy across advertisers. We can also look at some of the spend type by ad type. And this kind of shows you a little bit how the different platforms are used within here. So Facebook's platform looks like it's used a little bit more kind of informationally. It's still used a lot for donations, whereas Google's platform is used a lot more for donations and a lot less for kind of informational ads and to connect within here. It's really kind of hard to read anything into Twitter's data because it's such a small set of data. But from the data that we do have, it looks like there's a lot more kind of collection of emails and things like that within here. The other analysis that we did on the federal candidate ads was to look at, for Facebook in particular, right, we have the geographic impression data from here. So we can effectively look at how many states were targeted by each ad with a Facebook advertiser. And the interesting thing here is that there was no presidential election, so basically all of these campaigns were operating in one state. So their constituents for all of these elections were essentially in one state within here. And so if you look at the inform ads, most of those show in a very small number of states. So the inform ads are mostly being shown to the constituents that are actually voting for that candidate. However, if we look at that bottom line, the kind of gold line, those are the donation ads. And we can see that they were fundraising in many more states outside of their constituency within here. So 538 did an interesting analysis of one particular candidate, Beto O'Rourke. He was a candidate for Senate in Texas. Texas is a very conservative state in the U.S., and he did kind of surprisingly well within here. And he kind of embraced online advertising and online donation seeking were kind of cornerstones of his election within here. And so 538 did an analysis of his donation records in the U.S. at the federal level. All donations to candidates have to be reported to the federal election committee. So this is all in a database for the federal election committee. The 538 people did an analysis, and they kind of confirmed what we saw in the donation ads that he was getting about 52% of his donations from Texas and 48% from other states, primarily kind of from coastal states that tended to lean more liberal, like New York, California, Washington, and places like that, was where he was donation seeking. So this appears to be a very effective way of getting small-dollar donations kind of throughout the U.S. within here, through this online advertising. The last thing that I'm going to kind of talk about is the ad targeting. Facebook didn't directly release the ad targeting. However, we were lucky enough, and ProPublica made a browser plugin that people could install in their browser. And this browser plugin would identify what it thought was political ads based on a machine learning algorithm. And for the political ads, it would upload these to their server along with the targeting information. So for those of you with a Facebook account, if you're seeing ads, you can actually click on that ad kind of in the upper corner of the ad, and you can see why is this ad targeting me within here. And Facebook will tell you a little bit not all of why you were targeted for this particular ad. They'll essentially show you the two broadest categories of why you were targeted for this particular ad through this feature that they've added to their platform. And this is actually kind of interesting. This is something that if you're a user of Facebook, I highly recommend that you do, because I started doing it, and it was kind of eye-opening as to the level of targeting that was being done in terms of advertising. So that's kind of one thing that we've definitely learned from this is, right, that when you're seeing an ad, oftentimes there's a very specific reason as to why you're seeing that particular ad within here. And so we felt that it was very important to, as much as we could, understand this targeting that was going on within Facebook's platform. So ProPublica had this browser plugin and they had this data set that anyone can analyze within here. So if you do have Facebook and you're located within the U.S., I would highly recommend that you install this plugin because it helps us to kind of understand the political advertising in terms of the targeting within here. So we took ProPublica's data set and we effectively joined it with Facebook's ad transparency archive within here. This required us to scrape Facebook's ad archive because we needed the ad ID and this is something that they don't expose through their API currently within here. However, they do expose it through their user portal within here. So we scraped their user portal to join the specific ads that were in the ProPublica data set to the archive data set within here. And we were able to join about 75% of the ads from here. There were a lot of ads that were corrected by the ProPublica data set that just simply weren't archived by Facebook's transparency archive. It misses things within here. It's imperfect as to how it does things. And this would be another interesting analysis to do to understand what is Facebook missing in their ad transparency archive. And this ProPublica data set can allow you to somewhat do this although through bias of who installs the ProPublica plugin in the first place. So we joined these two data sets again with the caveat that the ProPublica data set is right. It's obviously biased by the set of people that installed it which are probably not going to be a normal representative set of Facebook users within here. But unfortunately it's the best thing that we have in terms of a data set that releases the targeting information within here. And so we collapsed this into three different categorizations of targeting within here. I'll just quickly explain Facebook's ad targeting platform for people that don't know about it. So one way to target ads is right through interest or segments, right? Age segments, gender segments, or interests like I showed you before within here. So this is one way to target ads within Facebook's platform. Another way to target ads is through uploading lists of information. So you can upload lists of people's phone numbers, people's email addresses, or their names. And then when you upload this list, Facebook will find those profiles within their database. So they'll basically join those emails with the emails that were entered by the users' accounts and then they'll target these people. So they'll create what they call an audience of these people through this personally identifiable information and then they'll target them through this method. The final kind of major form of targeting that Facebook offers is through what they call these look-alike audiences. So this is where you can upload PII information like email addresses, phone numbers, names, Facebook will link them to their accounts and then they'll look at kind of the interests and things like that of these users and then they'll find you other users, not these users but other users that have a similar kind of profile to these users within here. So these are these look-alike audiences that Facebook offers within their platform. And so we categorized it by this and again by advertiser type within here. So the thing that stands out is that the for-profit companies are doing a lot of targeting based on interest and segments. So they probably don't know who they're people that they want a message to are and so they're doing it mostly by interest and segment. Whereas when you look at the PACs and the political candidates, they have lists. So they have a lot of lists of people's email addresses, phone numbers, names of things like this and they're plugging these into Facebook system and this is how they're targeting a lot of people within here is through these lists. And this was suspected but it's interesting to kind of quantify how much this is happening and then the look-alike audiences are also being used. A good deal by everyone within here and this kind of makes sense, right? Because if you have a list of people then you advertise to them but then you have this look-alike audience of people that are similar to them that are also perhaps good people to advertise to as well within here. The other thing we can do is break this down by the intent of the ad here and this shows the difference even more starkly of the difference in behavior between the commercial people and the non-commercial people. The commercial people are targeting mostly based on interest whereas the other people that are say looking to connect with people they're the ones that are using the most look-alike audiences and this makes perfect sense because the connection ads are there to get people's email addresses, phone numbers, names and things like that. So when you use the look-alike audiences then you can generate more lists of people that will convert for whatever you want and you can retarget them with the direct list targeted ads later on. So this all makes pretty good sense when you look at how this is behaving from here. But again it's interesting to make this transparent for people to understand how targeting is happening within the US political advertising sphere within here. So these were pretty much the two major analyses that we did in terms of targeting within here. The final part and the part that kind of makes the juiciest of stories is kind of the more dubious advertisers that are advertising within these platforms in terms of political advertising. So we kind of call these more politely kind of new types of advertising within here. The first type is one that you would pretty much expect so this is this corporate astroturfing kind of stuff that's going on within here. We see these ads for citizens for tobacco rights and I pretty much expected that you look up this group and it's probably going to be some quasi-non-profit that's supported by some industry money from the tobacco lobbyists or something like that. So that's pretty much what I expected to see when I saw these ads. You go to this website and it's actually pretty honest as to what it does. This is probably because right of all the lawsuits and regulations around tobacco in the U.S. in advertising, but the website clearly states that it's operated by Philip Morris, the tobacco company within here. And this actually isn't a legal entity, the Citizens for Tobacco Rights. It's just simply a website that's been stood up, that's owned and operated by Philip Morris as far as we can tell within here. And this gets to a big problem with Facebook's transparency archive, which is that they don't actually vet that disclaimer string of the sponsor within here. So pretty much anyone can type anything that they want within that disclaimer string and Facebook will allow you to run it. We've tested it and as far as we can tell, you can't say that you're from Facebook, Instagram, or that you're Mark Zuckerberg, they'll block that, but pretty much anything else that you type in there they'll allow that ad to run within here with no vetting. So we discovered this. We politely, privately mentioned it to Facebook. Some reporters kind of trolled Facebook within here, so there was a reporter that trolled Facebook and opened up ads for all the senators. Within here in Facebook, and of course, Facebook approved them all from within here, and they did some other things to troll Facebook where they entered some other advertisements within here, but the point is that that disclaimer string is not vetted within here. Google actually does vet that disclaimer string within there, so they require either a tax ID number or a federal election committee ID number, and they actually do vet it, and they publish that tax ID number or federal election ID number along with the disclaimer string within here, which makes it really easy to track down advertising on Google. On Facebook, because they can basically type in whatever they want, in disclaimer string it makes it much more difficult to actually link these advertisements, and sometimes just outright impossible if the disclaimer string is made up or just too mutilated in some way or form within here. So this is definitely a problem where we have these lobbyist organizations, or in this case not even lobbyist organizations, just industry that can effectively lie about who's paying for this ad in Facebook's platform. The other thing we found were what's now kind of being called these junk media outlets. So this is for-profit outlets that are claiming that they're doing kind of news operations, but it's not really traditional kind of reporting, journalistic things, it's more just kind of propaganda messaging within here. So there was this group called New American Media Group, LLC. They also ran the name of New Democracy, or sorry, Democracy Now was their other name within here. And so they ran this within here. We tracked down these LLCs and they were just simply shell companies that kind of led to nowhere within here. We worked with a journalist from the Atlantic that actually did a lot of digging into these shell companies and he was able to, through his basically investigation, link these companies to the actual entity that created these shell companies and was running these ads within here. And so we needed our analysis of this company, basically this third-party advertising company, was creating these, they were meant to look like kind of grassroots kind of organizations. There were, a lot of them were kind of targeted at more conservatively leaning groups, but then they would bombard them with liberal messaging within here. So they would create these fake communities that looked more conservative, and then once they attracted an audience they would bombard them with these liberal kinds of messaging within here. And so this particular company is based in Colorado. It's called Motive AI. Apparently it's hoping to become the Cambridge Analytica of the liberal side. I don't know if that's something to aspire to or not. Some other journalists also did some digging within here. There was some journalists from ProPublica that did some digging within here. They found more of kind of this astroturfing by political lobbyist groups and things like that. Big oil insurance companies. Again, when they advertised on, say, Google's platform, they would be honest about their disclaimer string. And then when they advertised on Facebook's platform, they would often kind of obfuscate their disclaimer string to make it more difficult to link them together. And so they unmasked a whole bunch of these other kinds of junk media operations as well that were kind of spreading propaganda within here. I'm picking on Facebook a lot. Again, Google does vet the tax ID number of these people, but you see something like this Digico LLC that paid for some ads. So you track this down. And this is, again, one of these third-party advertising agencies. It's easy to track down because you have the tax ID number, but it still doesn't actually tell you who paid for the ad. It just tells you this third party that, right, it's presumably was paid on behalf of someone else to run these ads from here. So this is a big problem with these disclaimer strings is that oftentimes they don't actually identify the person that's paying for the ad. So to kind of wrap this up within here, after our kind of experiences looking at these transparency archives, I would say that they're fairly adequate to understand good actors. So we could fairly well understand how good political advertisers were behaving in Facebook's platform. However, for the bad advertisers, we probably missed a lot of them because they could just simply type in lots of different disclaimer strings and easily avoid our analysis at this point. None of these current archives have it just right yet. All of them have issues, right? Facebook isn't providing good access to their data. They're not releasing targeting information. Google is missing 30% of the content because of third parties using their advertising system. They're not releasing spend and impression information based on demographics within there. Twitter just simply hasn't hired someone to enforce the policy of transparency well within here. And unfortunately, our experience throughout this process has been that these companies are oftentimes reactive instead of proactive within here, which means that we have to continuously put pressure on them in order for them to kind of improve these archives within here. So this is unfortunately kind of the state that we're in within here. And I'm sure one thing that I really want to give a shout-out is there's people at these companies that are actually trying to build these transparency archives. And I want to give them a lot of credit for taking on this task that's probably not well-rewarded within their companies of building these transparency archives within here. And so my hope is that by applying pressure, we can get them more support to kind of get more resources and be able to make more transparent within their companies as well. Because I hope that, right, this puts us in better shape to understand the 2018 elections, but 2020 is another presidential election, and my hope is that we'll continually improve these archives so that we'll be in a much better position to understand both the good and the bad advertisers by 2020 within here. However, this is going to take probably regulatory pressure, legal pressure, pressure by technologists and things like this to improve these archives at this point. So with that, again, I have my collaborators that aren't here on the stage, but they definitely did a lot of the happy lifting to make this happen within here. And again, all of our tools and most of our data, except for the Facebook data that's under NDA, is available through our GitHub there. And so with that, I will open it up to questions. Thank you so much, Damon. I know that there are a few questions among the audience. So, microphone six, please. So the chain links on the IRC is asking, have you looked at links between the advertisers and do they use the same images or texts, for instance? This is a really good question. This is actually one of the analysis that we're currently doing. So we're starting with the text because that's obviously the easiest, but we're also exploring some image clustering algorithms as well to cluster the advertisers across platforms with in-platform. Because we're finding a lot where they create multiple shell companies where they just lie about their disclaimers. And so this is definitely something that we're focusing on is better clustering of the advertisers. Because like that group, Motive.ai, even though they created the different LLCs, they were running the same images and videos across their different LLC shell companies. Great. Thank you. Please, if you have any questions, queue up by the microphones. Microphone number one, please. Hi, Oliver Moldenhauer. Thanks a lot for the talk. Definitely one of the best I've seen here so far. Two questions. Why do those transparency archives exist? Was there some law or some political process around that? And B, as we are nearing the European election next year, what kind of data is available for Europe? Those are both good questions. Again, I'm not in terms of these companies, I can just speculate as to why these transparency archives exist, but my guess is right that this was reactionary. So Mark Zuckerberg and high-ranking officials from Twitter and Google were hauled in to testify in the House and Senate, and this is them trying to self-regulate instead of having regulation imposed on them by people. So that, again, this goes to the pressure part, is that there was regulatory pressure put on them, the threat of regulatory pressure, and so that's what made them do these transparency archives. In terms of what's available in Europe, I guess as long as the UK is still in the EU and kind of teetering, Facebook has started to make ads transparent in the UK. They also make them transparent in Brazil, and they're going to make them transparent in India, and I think they have plans to make them transparent in other places in the EU as well. They haven't done that. However, again, this goes back to the pressure part. There's no API for the other countries. There's only an API for the US, and that might be because we put pressure on them by scraping them and publicly releasing their data, and there's no transparency reports for other countries as well. There's only a transparency report for the US, and again, that might have been because we applied pressure and we were publishing numbers. Some of the numbers in terms of spends were very low and they were just giving us ranges, so we might have been making them look bad when we took kind of the bottom range of their spend and they might have wanted to correct that with their own transparency archive as well. So again, a lot of this, unfortunately, it requires pressure to get them to improve their transparency efforts. Great. Thank you. Microphone number two, please. So you mentioned 538 and their work on the donations. Do you think it makes sense to combine the data you gathered with what they have to look at election outcomes, like election results and turnout and stuff like that? Yes. Actually, this is the number one project on our roadmap right now, is actually Google has processed the FEC information and they've made this information available via their BigQuery database. So we've downloaded this. We've manually linked the Facebook advertisers and the Google advertisers to the FEC data and now we're doing kind of the regression models, specifically focused on the donation ads first because those are what are reported to the FEC at this point. So we're essentially trying to understand how effective these donation ads are at actually driving donations within here. Thank you. Microphone number four, please. Hi. First of all, thank you, Mr. McCoy, for this and your team, for this very interesting research. I was wondering whether you know if there are any follow-up research conducted by political scientists, sociologists, etc., analyzing the political repercussions of these ad campaigns? Yeah, so we're aware of a few efforts. I don't want to out the teams that are doing them in case they don't want to be outed. There's nothing that's been published publicly, I believe, on this, but we're definitely trying to... That's one of the main goals of kind of our overarching online political advertising transparency thing is to try and get as much data as we can in the hands of less technical people in an easy way for them to analyze. And so this is basically the primary goal of our project in here. So we've been working as hard as we can to get political scientists up to speed on the data. And this is why I'm... It's really unfortunate that Facebook has this NDA in place for their particular data, because this makes it very difficult for us to share and collaborate in that particular data, which puts pressure on us, unfortunately, as being the only ones that can do some of this analysis right now. So this is why I would love to apply enough pressure to Facebook to get better access to their particular data. Yes. And a question from the internet, please. So Nomad is asking, why are those advertisements considered political or election interference in the USA? Can't users see that someone paid money to display that content and conclude that the ad's purpose is to promote an agenda or manipulate them? This is a good question. A lot of this goes to the tactics that they're using here. So again, they're creating these communities that they're making look like they're grassroots communities. And then they're kind of sucking people in with these ads that up until recently had no disclaimer string on them, so you had no idea who paid for them. So they appeared to be paid for by kind of these grassroots organizations. So you felt like you were kind of part of, you know, right, a grassroots movement in joining these kinds of communities. So I think that this is the really scary kind of subtle things. And you might not have realized why you were being targeted for these particular ads and who was behind these particular ads. So I think it was really easy for people to kind of get unwittingly kind of duped into joining what looked like these grassroots campaigns. So that's why I think improving these disclaimer strings and showing who's really behind these communities and these advertisements is really important to, you know, dispel this notion of these fake grassroots communities that are luring people in with in here. So I think that's one of the big things that can be gained by these transparency archives. But it requires improvement of the transparency archives to do that. Microphone number three, please. Yes. So I'm curious about the efficacy of some of the advertisements that are on Facebook and Twitter. And I'm wondering, is any group or like the ProPublica web extension tracking the engagement rate? Like the number of comments, the number of views and the number of shares. So like kind of get an estimate of, okay, this fake grassroots community is like building up a number of followers and this followers population sizes and what not. Yeah, this is again a really good question. This is something that we or I would certainly encourage other people to potentially do as well. So the problem is that a lot of that information isn't exposed by the transparency archives. This is more of what they call kind of the organic information, the non-paid for information within urine. So this is stuff that none of the platforms are releasing. And so it would require kind of a scraping operation, essentially, to gather this information and collect it. And it's something that we're definitely thinking about how to efficiently do. It's how to efficiently scrape and collect this information because this is very hard, especially against the anti-scraping teams, these companies that are well resourced and this requires accounts and these accounts are going to be shut down and detected. So this is something that we're trying to pilot to understand. Our other idea of how to do this potentially is to try and crowdsource this information. It's a similar to how ProPublica crowdsourced it for the browser extension information. We could potentially crowdsource it where when people interact with these communities or these ads, the plugin could potentially crowdsource that information back to us. And then we would have to figure out some strategy to sanitize that information in some way because at that point you might have some sensitive information that you're collecting. So this is something that we're thinking about or cautious, I think, rightly so because this can start stepping on, again, more sensitive information that's available from within your... But I think it's definitely key to understanding the effectiveness of these ads. So it's something that we're going to have to do to convince Facebook somehow to do on our behalf in order to really understand the effectiveness of these ads. Thank you. Last question for microphone number one. All right, at the beginning of your talk you explained how Russia influenced the elections. I'm curious about their attribution. Is there possibly any doubt at any instance that you presented that it was not Russia or maybe some other country, China or Iran? What do you know and did you check the facts? I mean, that's a good question. Unfortunately, the national security agencies don't release the sources of their information. There's another investigation done by the Department of Justice, by Robert Mueller, that did release some more information about this within here. I've looked at that information and it looks... You can never 100% unequivocally state that it was Russia. It could have been a false flag operation but I think that pretty much the overwhelming information that everyone has found when they've investigated this has pointed at Russia and the organizations that were prosecuted by Mueller. Damon McCoy, thank you very much. Please give him a great round of applause.