 Hi, I'm Leo, the Founder CEO over at Graphistry, and I'm going to be talking today about Project Domino, which is not even at Graphistry, but a community open citizen data science effort against COVID misinformation. And before we go into it, I do want to say thank you to the DEF CON staff, AI Village staff, and the Project Domino volunteers, and also AI Village in particular, Sven. And this is a crazy time for all of us. And so to be able to put this together, a big thank you for that. I'm honored to be giving this as a keynote. I'm hoping folks here will benefit from hearing what we've been learning and also if you are impacted somehow by COVID and want a way to give back, this might actually be an opportunity for you as well. With that, I want to start with a kind of an example. This is about three months ago. It might feel like three years for some of us. And around the time when the Seattle breakout was happening, our friends were kind of dealing with the hospital situations over there. I think it was already like people were actually dying already in Seattle. Governor Kevin Stitt from Oklahoma posted this strange tweet, eating with my kids and all my fellow Oklahomans at the collective OKC. It's packed tonight. Support local Oklahoma proud. Basically, and then he clarified in next day in a press release. We encourage Oklahomans to go out to restaurants. So for us with ties to the medical system, our head kind of was exploding like what's up. Fast forward to a couple of weeks ago and unfortunately for Governor Kevin Stitt and his family, he tested positive for COVID and unfortunately for his constituents who are relying on him. They have about a thousand per day infections at this point because kind of what we fear would happen is exactly what happens and now Oklahomans are dying. So it's, I do want to do kind of some some disclosures and stuff in a bit, but to us it's not about a left or right thing. It's like we're, you know, this is the medical health emergency. We need to do stuff. And that's kind of a good note, like kind of where this project came from. We're kind of looking at kind of what can we do. And there's this interesting statement by the from the World Health Organization saying we're not just fighting an epidemic, we're fighting an infodemic. And kind of the reasoning that we kind of believe in here is until there's a medical cure, a pill, a vaccine, it's been tested, it's safe. Kind of the only thing we really have is behavior change. So for that, unfortunately, you either need to get top down policy enforcement or bottom up kind of citizen voluntary compliance. And unfortunately, we have folks standing in the way of that. And, you know, for Seattle, for Oklahoma, for all of us at home, where we're seeing kind of the effects of failures to win that fight. Some quick disclosures here. We are going to talk about real people. We've published, actually, we haven't directly published, we published through partners, so we empower researchers to do through our tech. So they've published on some of this stuff before and based on some of the results today, I expect to see some accounts going out and hopefully some takedowns. Likewise, this is a volunteer organization. So the views expressed by us, me, this talk, they're not necessarily representative of the employers of the volunteers. Likewise, you should expect some bias here. So a lot of us, actually some people on the team were even affected by COVID and then had their lives impacted by that. Family members do their work, work life has been impacted. Misinformation is a broader topic. I'm personally in a place some folks may remember called the Soviet Union where propaganda was a thing and surveillance was a thing. So conflicting thoughts on that. But I do want to point out for on the team, I do especially when I call out some of the kind of folks on the leadership team. Sean Griffin's been involved in pandemic response. Actually, he helped both Obama and the Trump administration with pandemic planning. Anita was over at the NSF and looking a lot of digital crime, blockchain and stuff like that. She's at UIUC now. And Cody Webb has been just kind of this like research powerhouse for our team and researching misinformation for a while forever. And I shared the names of some of the kind of engineers helping on our backbone. The first example today will be actually Jeff Goldberg over at Social Forensics and David Nickerbacher independent researcher been helping us a bunch on. And I think, oh yeah, and then tomorrow, if this is interesting to you, kind of the same types of stuff we're doing for for COVID. Andrea is going to be talking over at the biohacking village about managing your cancer patient. And, you know, you go to Facebook and then you see all the same misinformation giving you false hope or misguiding you. So it's important stuff. Okay, so today I'm going to talk about basically three things. First, I just want to take you down the rabbit hole through an example from anti vaccination misinformation. Second, I want to then give you more of a security data science perspective and how we're actually going about it, trying to figure out scale and our response and all that stuff. And then briefly, but incredibly important, why we're doing this as open citizen data science and why we're actually pretty hopeful about where this is going long term. And maybe this could kind of give you a chance or give you ways to kind of help out the community. Okay, so let's dive in. So we're doing an investigation and this name popped up, Sherry Tenpenny, and we decided to see where we go for the purposes of this talk. And for some folks, if you're from Australia, this may be a familiar name. She's Twitter, she's Wikipedia famous or infamous at this point. So her 2015 lecture was canceled in Australia because apparently she is involved in a lot of vaccine misinformation and private organizations in Australia wanted no part of that. And so they didn't support her. For our purpose and purposes, I do want to point out like there is kind of an ecosystem with different incentives here. And so if we look at somebody like Dr. Sherry, we see that she does teach courses like hence the book tour. She sells books, trains folks. She has about, you know, 50, 60,000 followers on Twitter, not shown. She also big presence on Facebook and also on newsletters of stuff that you don't even see. And she gets paid to do it now. It's kind of like her beliefs and her livelihood are the same thing. So she's very invested in us. But once we start looking at it, there are kind of a few things that looks a little strange. So she talks about Nazis in her profile. Like that's the first thing she does the first word she shows you. And so that kind of kind of like a whistle to one side of the house and then talks about abuses by the state. So that's kind of like a different kind of thing. So it's kind of just ask like who like who's the community around her. So let's take a look. So what I did here is loaded in busy Dr. T. And so we're looking at her and what we see in colors around her are basically she's only following a couple hundred people, but she has 50,000 people following her. And so when we look at a sample of that, what we see up top is a bunch of folks essentially public basically anti-vax, natural medicine, a lot of kind of medically minded folks that you'd sort of expect to be here. But then surprises is once we start seeing the other colors. And so the tools here just kind of automatically colored and laid it out and everything like it's kind of just traditional graph analytics. So we see Donald Trump. We see Michelle Malkin. We see Candace basically like in the dark green. We're seeing these public figures kind of national conservative public figures on the right. We were seeing but a bit on the right and then actually especially down here. We're seeing a mixture of QAnon, which is a conspiracy group we're hearing about a lot nowadays. We're seeing a lot of MAGA and Trump supporters. So not super surprising given we had Trump. And so it's kind of interesting that we have on the top we have like who do you expect with anti-vax, but on the bottom we have like the whole MAGA thing. We have conspiracy theories from QAnon. They're actually even like UFO stuff going on on the bottom. It's kind of like interesting to see that mix. So I want to point out another phenomenon here. When we're looking at each community, the question is like how are they interacting? Is it like they all really love our Dr. Bizzy T. She just wrote this wonderful book. What we actually see is like for example from the QAnon zone, they're reaching into the medical zone. So they're following folks there. Conversely, if we pick somebody big in the kind of this top region, sometimes it's localized, but other times you see it's reaching into the QAnon zone. No matter which cluster I do, you're going to see those reach-outs. Excuse me. So that's actually kind of strange where there's no reason to believe one conspiracy has anything to do with another. And so we're actually kind of almost like in this other world. And what we're seeing is the networks do crossover. And the first thing is what we're seeing is the question is why? And that's basically to support influence. Sometimes that could be direct where you're just retweeting from each other. Other times it might be to trick the Twitter algorithm. So for example, here is actually a Fox News reporter. So it's someone on TV. They have 40,000 followers, which is reasonable for a personality. But they're following 30,000 people back. And that probably suggests an inauthentic and automatic action of following back somehow. We don't know what order they happened in, but in general, that's prohibited by the Twitter's terms of service. But they get clicks. And so even though it violates terms of service, there they are. And that's basically the welcome to the Vax network misinformation network. And I realized that I glossed over this. But the reason we care about Dr. T and that she's talking about this stuff is the she and her community are already pushing messages against a COVID vaccine, even though one hasn't even passed trials yet. She's against it before it's even been tested, like just by principle. And so Twitter's been supporting her in doing that essentially. And I'll get to that in a second. So I do want to take one more. I want to go a little bit deeper into our Dr. T. And so in particular, I want to understand the communities, like what they're doing. So here what we're going to do. And I want to see if there's kind of extra coordination structure. And so we're getting a bit more of the data science side. So what's going on here is we took the accounts. We actually took a look at about 100 million COVID related tweets to see if there's kind of organized behavior within it. For each account, we just tried to look at for known sort of misinformation topics. What's sort of the signature? Like, you know, how many times did it like activate on different topics? And for each account, we now have that sort of like misinformation fingerprint essentially. And then we ran something called UMAP. I think it's a universal manifold approximation or something like that. But basically tries to smooth out and assume structure behind that. And then we took a look. So let's try it out. So yeah, so what's going on here is we're taking a look at those accounts. And we only are showing from all the accounts we're looking at the 5000 or so that we actually saw active across multiple misinformation topics. And so these are kind of the really the ones that have kind of that behavior going across topics like are just like what's going on. And we ran UMAP. But then what we also did, you're seeing here, I'm drawing these little edges. We also ran a afterwards K near sneeber and put them on screen so that whenever we see accounts kind of tweeting on similar topics and similar intensities. So it's really, for example, imagine two bots program to kind of talk about the same thing or controlled by the same person to kind of retweet someone else. That's when we're going to draw that edge. And so again, they're not necessarily following. They're actually just on certain topics. They're actually pushing those topics in the same way. So it's entirely content based. And what we're seeing here is some interesting structure pops out for across COVID Twitter. And then the second thing I did here is coloring. You're seeing some of these are a bit pinkish. So if we jump, let's take a look at this one. So for this kind of group of activity, we're seeing a lot of this QAnon stuff. And so what I'm doing here is if in those tweets talking about COVID, they also talk about QAnon and we have a couple of words on that. We call that out too. And so what we're seeing is like, you know, for this general topic, a bunch of accounts never explicitly mentioned QAnon, but it's definitely. Maybe it's like two coordinators or who knows or like something like that. But we're actually seeing a lot of QAnon influence across these different organized groups. And actually in particular, almost all of these have a bit of QAnon. So you're seeing it's not all white. It's all a little off white. And then what's more, you're seeing some of them are seeing these pockets of heaviness. When we calculated it out, it's about 15% of these accounts spread throughout our QAnon. At least for COVID Twitter. That's actually kind of interesting. For us, that's a surprise. So that kind of put put QAnon in terms of medical misinformation on our radar. So that's not good. Okay, so jumping ahead. So okay, so we're looking at our busy Dr. T. Clearly blatant activity going on here. The last thing I wanted to point out was then we look on the right here and Twitter saying, hey, you should probably now that you're here, should also check out this Judy Mikovits. This PhD sounds authentic, right? Sounds smart. You may have also known about this Judy as a video going around called pandemic, which got pulled down as a pretty bad misinformation. And so that was pretty shocking to see like, I was like, oh, that name sounds familiar. So that was a little crazy. That was like the number one recommendation of Twitter as soon as you go to this webpage is to learn about pandemic. So, and this is like, you know, way after the fact. And so kind of, in summary, what we're seeing in this basic, pretty basic, honestly, analysis. Very blatant anti-vaccine misinformation, a lot of it about COVID in particular. We're seeing it's not just anti-vaxxers, but we're actually seeing a lot of cross community support by misinformers. It's pretty blatant. It's pretty, pretty obvious how they abuse like the very few terms of service Twitter does have. So it's a little strange. It is a little disappointing to see Fox, the Republican Party, MAGA, and a lot of them kind of involved in this anti-medical misinformation. And again, type of so blatant. And then on the platform side, it's when we saw that overlap and like, oh, wait a minute. Twitter makes money off of some of this stuff. They're clearly looking the other way for the terms of service violation. And then like, what was the, the, the pandemic promotion just, that was just that blew my mind. So this is not, this is not a healthy situation. So from a data perspective, we're actually a security analytics perspective. I do want to step back. If we think about attackers defenders, ultimately the targets are individuals and organizations going on a radicalization journey. And so every time Twitter recommends you do another crazy thing that just takes you deeper. Interesting extra level on the threat model here is that folks are essentially influenced based on their life conditions. So maybe they're going through a hard time, maybe they're elderly and don't really know what's going on. And also by their community, you know, if you're, if you're a pastor or somebody or your family, friends tells you something, you're more inclined to believe it. And so being able to infect communities with this stuff is having mastery over that and platform level support. It's not good for actors. It's a lot of what you'd expect. I think that the surprising thing for threat actors here is we actually consider the, at this point, the platforms to be part of the threat. In particular, we, we break down for the platforms as the social network system. So Facebook owns a bunch of social network, Google owns a bunch of social networks. A few are just big on the right like Reddit and Twitter. And, you know, Twitter is a multi-billion dollar profitable company. But here we are. Little less obvious to folks is messaging. And again, actually a lot of the same companies like Google, Microsoft, Yahoo, own essentially the messaging channels. Those are completely unmonitored. And so a lot of times the radicalization actually hops over to newsletters. So that's stuff basically dark data. And also the local news, I think like Fox and folks like that. A little surprising is there's kind of a monopoly effect where because you do have regional different ways of having monopolies on people's attention. That's all reason these companies got so big. Same reason we're basically at the whimsy of these companies. And so when you kind of calculate it out, like the 80%, essentially is just about 13 of these groups. And so we're, if we have a fickle Silicon Valley CEO, if we have employees who just go along for the ride, so we have advertisers who keep feeding them money. So that's kind of the threat model perspective. Shifting gears a bit to more like kind of like security, data engineering, data science. In order to do the data science, I did want to talk a little bit about this. We end up having to build a lot to work at kind of the speeds and scales we're targeting. So if you want to have an instant response pipeline, that means you have to kind of have streaming. You know, can you take hundreds or thousands of things a second? So our V1, you're kind of seeing the performance of that and we're architecting to a V2 so that we can actually go to arbitrary scale. Just have essentially budget limited. You are noticing we're doing a lot of graph technology. And so basically that's a nice way to deal with a lot of heterogeneous data and a lot of like linkage of data. Even actually in a machine learning world where instead of just direct links, you might have vectors, future vectors. Two other interesting things about the stack I did want to point out here. One is we do do end to end GPU computing just to be able to, you know, you can actually, if you want to have a building now graph and in memory, you actually could do that stuff with, for example, the Rapids AI stack. Likewise, neural nets in this space graph neural nets are getting interesting. We actually do more today on the traditional ones, but we're looking at those. And the last part and I think most important is kind of division of team. We have basically heroes for data engineers. And then a big part of this is figuring out ways to get out of the way just given it is an open volunteer effort. And so we actually do a lot on the self service. And that's a big theme for when we look about powering. So think notebooks, streamlet dashboards. Graphs to reduce visual analytics and automatic ML just the increasing ability of people just to go and handle the stuff. As a from the data science side, I kind of wanted a fun example here. For example, we had a news organization ask us about Obama gate. And so one of our analysts Cody did a good job of using basically hashtag analysis to basically connect the dots to about a month back to some Q and on stuff. And it was kind of impressive because the Trump tweet made no sense at all by even when I read it today. All these kind of code words. But once we move to more modern NLP techniques and go away from exact, you know, match of terms. So Cody currently had to look for Obama gate. But with NLP, we can now do things like, well, let's just look for the word Obama and the word conspiracy, but actually not the word, but the top and just the fuzzy topic. And when you have those things, then we could kind of more loosely kind of walk back. And so that means we can do kind of traditional analysis more easily. And so we've been building out a lot around that one point out FICE indexing from from Facebook and Bert modeling from basically what everyone does now. And then the other thing I want to point out here is this is also opening up types of analysis that just were straight up like you couldn't even think about. So maybe one example could be for looking at masks in a community where you might ask like who which messages are against mask use. That's kind of a strange that's kind of a get sense of semantics and meaning a bit. And that's hard to do just by by keywords. And then could you summarize it? Could we actually have a demographic breakdown? There are just certain things that with traditional NLP you couldn't have done. But over the years over the last like five years and actually even two years, it's just been really not so it's been going on. Then kind of last piece I'm going to share here is kind of just where we are where we're going. You're kind of getting a sense that we can now you know bring in a research researcher on a topic and if we have the data or help them pull in the data and kind of do this kind of analysis. The next level of empowerment we're looking at is like as we're building this out is could we actually partner with less technical organizations and if we have to kind of do this to protect their community. Imagine like a local civic leader talking about the cancer community with Andrea with combat risk patients. And there we're looking at could we start helping them to kind of figure out is are there folks who are not even maliciously misinforming in their community but people are just confused or are there people kind of targeting them. They're just tapping out and building tools and stuff around that. So that's kind of the phase shift we're starting to go into so that's pretty exciting. You're kind of seeing as you build out at state of capability all this stuff unlocks. And then finally I just kind of want to close out and tie together some of the stuff here. So the beginning as I was kind of sharing that investigation hopefully what you got across is like yeah like the analysis is cool and we're able to like do these giant analyses. Ultimately a lot of the important stuff is so in your face and so huge numbers that like again that analogy fishing with dynamite and it's you know those 13 companies that they shouldn't be gifting essentially millions or billions of dollars and free targeted ads to these groups and targeted recommendations. It's kind of that stuff that I think is more systemic. I am disappointed to see kind of these otherwise legitimate groups appear to be corrupted by misinformation. So call out essentially we're seeing MAGA is just so ever present here. We're seeing social media companies, regular media companies. So there are things you could do if you're an employee. If you are anywhere near the advertising spend of your company, all that stuff's important. And then kind of turning it around. That's all kind of social policy. A lot of folks tuning in are technologists of different stripe. And bad news is we're seeing these monopolistic media companies are not able to self regulate like the left hand or the right hand are always at each other and it's kind of brought us to where we are. Likewise, they've kind of gutted investigative and journalist teams of just because they have companies sucked all the money. Likewise, we're seeing private companies are sort of compromised as they're kind of trying to deal with revenue. So that's bad. What's good is we actually have an open source community. We I mean just like the technology community on that is huge. And now at the same end data science is, you know, our Python that's almost like built into the DNA. So I see strong potential and gathering potential for that. I encourage that you're seeing some of it happen here with project domino. And so on that point, if you are into this kind of stuff and you could help out with coding head into our slack. Likewise, as we launch in the coming months, these kind of more self serve and tears for folks sitting on top. We chat for partnering. And likewise, while we are looking at doing sustainable approaches to our growth, any like in especially a sort of a bootstrap phase that we're in for the next year or so. We're definitely, you know, grants, whatever that stuff does help with the core team do what they need to do. Likewise, you know, serverless compute time, anything like that. With that, thank you for your time. And hopefully this changed a little bit about either how misinformation is working in practice in the COVID era, and also even more importantly, that we actually could do things and we actually detect it and then can actually build tools for it. Thank you.