 Thank you, Danny. Good evening, everybody. My name is Mike Innes. I'm director of the Conflict Records Unit. Conflict Records Unit is part of the Sir Michael Howard Center for the History of War in the Department of War Studies at King College London. This is the first of our speaker series. We have about one of these a month that's going on over the course of the next year. We also have a conference. It will be our first that will be holding in May or June of next year. The theme is documenting war and the call for papers was just made public yesterday and today. I think the link will be dropped into the chat function. The structure of tonight's this evening's talk will spend about 45 minutes, and that will include some introduction. I'll introduce our guest, Thomas Heghammer. He has some very cool slides, and he'll be talking about the Jihadi document repository. And then we'll have time for a little bit of Q&A at the end. If you have any questions for Thomas, just drop them into the chat box. I'll moderate those and then we can discuss from there. So with that, Thomas, thank you for joining us. It's not an privilege to have Thomas joining us. Thomas is a senior research fellow with the Norwegian Defense Research Institute. He is a leading scholar of Jihadi movements and is published widely on the subject. The title of tonight's talk is Reflections on the Jihadi document repository, which means for reasons that I think will become pretty self-evident. We're going to be talking quite a bit about information and technology as at least part of, I suppose, part of what we'll be talking about. And I think a useful point of entry for that is, Thomas, what I think is your most recent publication, an article in the current edition of Foreign Affairs. And Thomas, why don't you walk us through that? And that'll set the context, the backdrop for, I think, probably the main thrust of tonight's talk. Yeah, sure. So that article is called Resistance to Disputile because it argues that in the West, states have sort of solidly won over Jihadi groups like al-Qaeda in Islamic states. And the sort of the point, the bigger point I'm trying to make is that under the surface of a seeming sort of constant stream of attacks, which suggests that the Jihadi movement has been alive and well, under that surface has occurred a big and deep shift in favor of states and state power. And I'm arguing that this shift has been long in the making and, in fact, it starts before 9-11. And it's primarily driven by technology. And so the even bigger point here is that technology probably empowers states more than they empower non-state actors. And we are, and the Jihadi attacks that we've seen in the 2000s and the 2010s are sort of quite exceptional in a sense. And they haven't really undercut that bigger and deeper trend in favor of state power. And I see that trend continuing in the future, especially with the rise of artificial intelligence and such. And I suppose the link here to what we're discussing today is about propaganda and the role of propaganda in rebel recruitment and the sustaining of rebel groups like Jihadi groups. And I say at various places in the article that I think propaganda distribution technology has been very, very important for the rise of Jihadism over the past few decades. Especially, of course, with the internet and various technical solutions for distribution. And now, of course, states have realized this and are clamping down very hard on propaganda distribution online. And I think, by the way, that's one of the important reasons why we're seeing less activity today. But before that big clampdown on online propaganda around 2017, Jihadi groups were quite free to disseminate stuff online. And that made available a tremendous amount of primary sources and material coming from the groups themselves. And this has been a very, very important window into these groups. And probably the main data source for myself, my own work and that of my colleagues over the years. So what we have been doing for years is to mine this resource to collect the propaganda that Jihadi groups put online and store it and analyze it. And there came a point a few years ago where we decided that some of this material has public interest, can be used for research and should be made available. That's sort of the starting point for the story of the Jihadi document repository. That's a really great setup. The immediate question that comes to mind, and I guess I'll ask it now, but we can park it and come back to it later, is that as you're going to present, I think the Jihadi document repository has its origins back in the midst of time prior to the advent or the real explosion in what you call propaganda distribution technologies. And so there's a question of volume and the durability of collecting this kind of material as individuals, as researchers. I think that's probably a theme that you're going to, if I had to guess, you're going to mention that, but we'll revisit it later as well. Yeah, that's excellent. If you want to kick off with your slide presentation, I think that's probably a good setup for that. Yeah, sure. So is that visible to everyone? It is. Okay, yeah. So I suppose a characteristic of this particular collection of documents is that it's not really conflict documents. These are not captured in the field, but they're mostly collected online from a safe distance. And so they're also voluntarily published by the groups in question, as opposed to unwillingly relinquished under conquest. So there's a mention there, I suppose that sets this apart from perhaps some of the other collection that you'll be discussing in the months ahead. Oh, sorry. So basically, I'll talk a little bit about the background, give an overview of the repository, reflect on some lessons and mention some future plans. And the repository is at jdr.as, if anybody would like to browse it as we speak, as we speak. Although it does require registration, but you can get a little bit into the, sudden depth into the database without registration. So the background, as I mentioned earlier, is that we have been collecting jihadi propaganda for a long time and we're sitting on a very large collection. The terrorism research group here at FSI was founded in 1999. And I arrived in 2001, by which time my colleague and boss, then Brynjar Lea, was already collecting things he already had. He was very well familiar with azam.com, this old jihadi website. After 9-11, of course, with a general kind of interest in this topic, we beefed up this effort and have been basically added ever since. And of course, with the advent of sort of internet-based propaganda distribution and the enormous scale that this took, it produced a very large amount of material. And so we're sitting on, I actually don't know how much how much material we have. But I can say that it's mostly from the web, although a couple of magazines are basically photocopied from libraries in Saudi Arabia and such. But the vast majority is from jihadi websites. And it contains all the various forms or product types that jihadi propaganda has taken over the years. So magazines, books, declarations, pamphlets, videos, images, audio recordings, music of the sort of unashied. We have copies of entire forums, copies of static web pages, etc. And this collection was when we started this work with the JDR in around 2014, it was at best semi-organized. Basically, it was sitting on the hard drives of the individual members of our team. And we had made some effort to organize and pool our resources. We have sort of a shared drive where there was some material. But we'd never really figured out a way to systematize this collection effort. There was just so much, there was such a flow of material coming all the time that we had enough just keeping up with it. And also, as I'll come back to, a lot of this material is quite unwieldy. It's hard to find a good organizing principle for everything. And maybe there is one, but we didn't take the time to sort of sit down and develop a really structured way of doing it. So we had this sort of big collection. It was semi-organized. And I figured this is a a shame because there's no way that we alone will be able to utilize it at all. And then we had other motivations for doing this too. We work as academics and are committed to open science, to sharing data, to making research replicable. And we are of the view that too many professors die with unshared material on their shelves. Sharing is the future and vital to scientific progress. And so we felt that we needed to do our share for the cause of open science here. Another motivation was that a lot of this material is transient. So it stays up for a while and then it's gone. A general problem I suppose with internet material, but all the more so in this field that has undergone kind of has been sort of under pressure from authorities. Various times websites and such will be closed down and the material might be gone forever. So we saw a need here to basically preserve for posterity some of this sort of material. And the final motivation was that we saw a gap in the market as it were, the market of repositories or data collection efforts for early materials, especially specifically pre-2010 material. And many of you will be familiar, broadly familiar with the landscape of sort of propaganda collections and repositories. You have things like the open source center formerly FBIS in the US, which it covers goes quite far back, but is closed off to the public. For a while it was open to kind of academics and others, but at some point 10 years ago or so they made it US government employee only. So we couldn't even access it. And you have some commercial actors like Rita Katz's site intelligence group started in the early 2000s. They have a lot of material, but it's subscription based. And it's not cheap. Same with BBC monitoring, relatively old, have tons of material, but subscription based. Same with memory. Memory is sort of propaganda collection outfit. They started I think a little bit later, but again subscription based. So really the only major repository for this type of material has been Aaron Zellin's jihadology. And which is obviously a phenomenal effort and a very, very valuable undertaking. However, he started around 2010 and has been naturally busy covering new material and filling it in with the new material that comes in. So that collection kind of starts around 2010. It doesn't have earlier material. So and then of course you have some special collections which has material from the 2000s and such and earlier. You have the harmony documents of course at West Point. You have another project at Haverford or Haverford College. And more recently you have things like Emma Thamimi's Islamic State Archives. But these are what we might call kind of special collections, cover a particular subset of the propaganda production, jihad propaganda production. And in some cases of course in the harmony case, not just propaganda, but internal confiscated internal documents. So we figured we would try and make something slightly different, basically a repository for texts of reference by jihadi groups going back in time. And focusing on magazines and major texts. And because they're more that they're more wieldy. We decided to only include texts and to focus on pre-2010 material. And to try and sort of aim for kind of backward completeness rather than forward completeness. And sort of basically leave the job of getting the latest to others. And to focus on getting perhaps more rare historical material and put it in our repository. Another aim was to have this public and free. And to have it there for a long time to make it as prominent as things can be on the internet. There were multiple challenges. We hadn't done this before. And so we struggled with things like the state of the collection I mentioned earlier, the fact that it wasn't properly organized. And there was a lot of it. And it has contained formats that can be unwieldy like videos, for example, or and such, which quickly, at least when we started this in the mid-2010s, it was slightly challenging from a technical point of view. Things like privacy concerns and copyright and such. No, copyright is, I think, was less of a concern because this was from websites that were published by, you know, run and published by jihadi groups. So, I mean, I'm not a lawyer, but I think it does make a difference if these actors have volunteered the material in the public sphere as opposed to having them taken from them. Also, the privacy issue, there's a, you know, potentially an element of concern here in that some of these magazines might mention the names of people who maybe later they defect or something, and they would rather not be associated with the movement, or maybe those, maybe these magazines mention kind of third parties in some way and therefore kind of identified in this collection by us. But we considered, you know, we judged the nature of the content to be mostly kind of ideological and to contain relatively little information like that. I think that issue becomes much more acute when you're dealing with internal documents captured in the field. More worrying perhaps was the issue of going to misuse the possibility that if we put this online, jihadi might use our platform as a source for their own propaganda. I mean, if their website went down they could come to us to find a backup. And finally, of course, they're just the technical side of it. How exactly, how do you go about building a website for this kind? So we ended up with basically setting up a little sort of project or task force on this with some, we were lucky to get funding from FSI of around £50,000, which allowed us to hire a part-time research assistant. I should mention here that it is this research assistant and his name is Erik Skara, who is now a fully fledged PhD and a great one at that. He did the vast majority of the work, the practical work for this and for getting it up and running. He made the jihadi documentary repository effectively. And so we basically started this project and we decided to keep it limited in scope, start small and then perhaps build, add material later rather than try and have something really complete to begin with. We decided to host this on the University of Oslo's website. This was a kind of a natural port of call because Brynjar, Leah has an affiliation there. We prefer to have it on an academic side partly for kind of longevity and partly for just the perception of the collection. We genuinely want this to be a contribution to the academic study of this as opposed to some government effort. Figured that the University of Oslo would be a more neutral platform. To protect against misuse, we made it registration-based, which, of course, is a trade-off. It does, I think, really limit the usage of the site, but I don't think we could have risked in doing it otherwise. We watermarked all the documents. We launched this in late 2016. It took probably a little over a year, I think, from when we started this until it was published. Let me quickly just give you a guided tour of this. On the left is what you see when you go to JDR.as. You'll see it's organized in four main categories. You have journals. You have writings by ideologs and leaders. You have biographies and memoirs by members of these groups. The fourth one cuts into the other three, but it's material in English. Let's say you click on the journals link. You get to another where it's sorted into languages, Arabic, English, and Urdu. You click on, say, the Arabic one and you get to another filter organized geographically. These filters are mainly to keep it manageable because the number of magazines, magazine titles, is in the several tens, so probably around 50. Clicking on, say, Central Asia and the Caucasus, you get to the page on the right here with various magazine titles. Going back to the mid-1980s, the first Jihadi magazine being Al Jihad magazine, which you see there down on the left. Click on Al Jihad and you get to the magazine presentation page, the little vignette, and links to PDFs of all the issues that we have. In this particular case, if we have a near-complete collection, I think we're missing about 10 issues of over 100, I think 115 or 20. 1984 to 1995, I think. I click on one of those links and you get a PDF in your browser window. Go back, click on the ideologues and leaders. You'll get again to a kind of a geography selector and go deeper in, you'll get to individual names. You can look up, say, Dr. Kadir bin Abdulaziz, Dr. Fadil, and you get a brief biography and links to some of the main works of his. For most of these individuals, we haven't posted complete collections. I have only the main ones, mainly for manageability. Although I think there's an ambition to fill in with more material in the future. Similarly, with biographies and memoirs, you can go in there and choose between kind of leader biographies or foot soldier biographies or memoirs or first person accounts. And you can click on the leader tab and you can look at someone like Yusuf Al-Wayri, the leader of al-Qaeda in Saudi Arabia in 2003. You'll find links to biographies about him from the jihadi literature. So that's basically it. It's very relatively simple. It's a simple structure and it's not a very large repository. I think compared to others we find in the in the world in the world of libraries and document collections. But it is unique in the sense that it has much, especially on the magazine side, has much of the material that has been published by this movement over the years and you won't find it anywhere else. You'll find individual titles or sub-collection somewhere elsewhere but not a broad selection like you have here. There's obviously a lot that we don't have. For example, there's a ton of texts that don't kind of fit the magazine or the book format. There might be kind of little pamphlets or one-page statements, two-page statements, photo montages that are somewhere between kind of images and statements. I mean, it's just like a whole gray area of formats here that with literally tens of thousands of items that we haven't included. We've chosen to include a major titles where we have relatively complete collections. So what have we observed in terms of impact? We see that it's been used by academics and by students and journalists. We have seen some citations in academic publications. Not a tremendous amount, you know, 38 hits on Google scholar and I think it's five or six books, although two of the books are by members of my team. I have a suspicion that there may be a few more examples of JDIR used in the student world, student theses and such, but it's more difficult to get a complete overview of that. And we've seen it occasionally used in other contexts, social media and in regular media. We've also been lucky to get contributions from other members of the scholarly community, people who kind of reach out to us and say, look, I've looked at your collection, you're missing these and these issues. I happen to have them. I'll be happy to share. So that's been a very welcome development that we're very grateful for. And the fifth is kind of less measurable, but I do think that this and other repositories contribute to improving replicability in our field. For a long time, the quantitative social sciences have been much better at making their studies replicable. The quals among us have had a tendency to cite stuff and kind of not make it available on grounds of either the material being sensitive, sensitive, or they're not being a platform or a way to make it available. But nowadays with digital tech, there's kind of no excuse for not making it available. So we're this sort of thing and the sort of thing that you, Michael, are doing are important contributions to that. Some lessons. It was a lot more work than we anticipated. And the end cost was probably closer to 100,000 pounds. And that may even be in the low side, because we haven't really then factored in all the hours that those of us kind of not formally involved in the project have put in, which is not a small number. And yeah, so the lesson there, I suppose, is that it's not just, it's not just, you can't just put stuff online. You can't just create a repository. That doesn't, this doesn't work like that. It's really, there are a lot of moving parts and a lot of things that pop up that take time. And one thing we experienced was that a major bottleneck was kind of communication with the people building the platform. So the web developer team, who are obviously not specialists in this domain. And also had many other things to do other than serve us. But that has been, I think, a challenge. And I think it kind of reflects a trade-off that many repository builders will face, which is the trade-off between kind of technical quality and flexibility. So by quality, I mean, if you want a repository on a really solid platform to the today's standards, and you want something that's on a kind of a high profile site, like the University of Oslo, it's a big university. If you want to have it there, you will of course have to deal with the professionals whose responsibility is to keep the UIL.NO website good. Of course, it'd be easier, but have more flexibility. Things will move faster if we ourselves could set up the website. But then you trade away the quality. But there is an inherent problem there, also factoring that labor in the IT field is even more expensive. And so you can't just sort of just ask, call up your IT guy and say, I'd like you to do this. It's just a couple of days work. It doesn't work like that. You may have to wait two months and pay for it. So that is something to bear in mind. The third observation is that it hasn't been used very much. Just quite honest about that. We have a little over 200 active subscribers. So demand is relatively small, which I think reflects the general kind of focus in our field. People are focused on the latest developments more than they are focused on the history. And also academia is small in general. But I think we shouldn't measure success, certainly not primarily in this, because I think the academic work that some of these 220 produced may be of very high value. The fourth observation is that we haven't observed any kind of attempts to infiltrate or misuse the platform. Leading me to think that maybe that fear has been a little bit exaggerated. Some limitations in the current platform. There were other things that we're not entirely satisfied with. First of all, of course, it's not a complete corpus. I mentioned some of the things that are not there. Also, there's no within document search. So if you type in Saudi Arabia in Arabic, in the search pane, you'll get nothing, which is unrealistic. I mean, there's bound to be documents in there mentioning, referencing Saudi Arabia. And it has to do with the fact that many of the documents are not OCR. I mean, they haven't been processed with optical character recognition. So you can tell by trying to copy, to mark an area of the text and copy and paste it into another document doesn't work because it's an image. And so a lot of the PDFs are images. And that is a problem. And it's a phenomenon you see in many, I think, many repositories and digital archives. And I'll come back to why that is important. Let me say just now for now that the fourth kind of frustration is that I think the access mechanism could be a bit more user friendly. And at this point, you can sometimes be difficult to gather registration to work. Also, if you don't, there's not a kind of an entry, central entry point at the beginning. So if you go to the site, you can click your way quite deeply into the site. But it's only when you try and access the PDF that the system will tell you that you're not registered. You can get the illusion that you have access. But in fact, you don't. Some future plans. We hope to revamp and expand this in a number of ways. We want to expand the collection. And this is probably not that far away. We have already kind of prepared a batch of candidate editions, magazine editions in particular. And I should mention also that some years ago, we built a video archive. It was probably our best early effort at organizing things. We actually have a fairly systematic database of videos from the early 2000s to the late 2000s. Something and we hope to perhaps put that online as well. But the main thing is we're going to be working on is to extract the text from the images to OCR PDFs and have plain text versions of the documents on the website. Not only to allow searches, but also to allow more computational text analysis to be done on these documents. There's a huge potential for the use of programmatic methods on jihadi text that is barely being used in our field. But to do that, you need plain text version of the text. And until recently, the big barrier there has been OCR technology is that it's just been very difficult to get reliable OCR on Arbeck, especially on noisy documents. But there's been progress there in recent years. And we think it's possible in the coming years to get sort of ground truth quality text extraction from these documents. So at the end, I just want to mention the sister project of the JDR, which is the Taliban sources repository, which Michael knows even better than me, I suppose. That's a story perhaps for another day. But it's a sibling in the sense that it sits on the universal website on a similar type of platform and has related material, this time from the Taliban as opposed to from AQ type groups. It's probably an extremely valuable, extremely valuable collection for reasons that I think we'll perhaps get back to another time. I think I'm there, Michael. So it was a nice bit of punctuation at the very end with the TSP. TSP is amongst the few of us who've worked on it is short for the Taliban sources project, which has been wisely now in its current form as part of or alongside the JDR, the Taliban sources repository. I managed the Taliban sources project, but I was by no means its originator, or its architect, that credit goes to some leading scholars of Afghanistan award-winning scholars of Afghanistan, Alex Strick, Felix Kuhn and to a lesser extent, Anand Gopal, who joined Alex and Felix. And then we sort of all came together and then converted in much the same way as Thomas, you and your colleagues, you recognized you'd been amassing this stuff as individual researchers. So they had been doing the same thing while living in Kandahar between 2006 and 2011. They would just amass, you know, all of this material that they'd acquired in the local environment and were quite reluctant to let it go at the end of that period and how to make it accessible. That is the major challenge that projects like the JDR, the TSP, other projects like it that they face and making it accessible, whether that means taking material that's in siloed individual sort of storage areas or drives or personal drives and putting it into a collective space so it can be accessed or indeed using the right kind of technology so that information can be extracted from it or translating it for non-native language speakers of Arabic or Pashto or Dari or what have you. There are all manner of approaches to encouraging access and enabling wider access to this material. That's why I've always been really fascinated with the JDR, watching it sort of mature and in fact, you know, when we were completing the Taliban sources project, which, you know, in terms of numbers, there's about 50,000 pages of scanned material, mostly in Pashto, a little bit of Dari and a trench of Arabic as well, translated, including some translations, about 2 million words worth or more of translated to English. There was nowhere to put it, you know, we wanted to do this and then we had the same vision in mind through sort of running parallel tracks. We wanted to make sure that this was placed in a stable institutional repository, which would enable broad, I'd say responsible access is probably the best way to put it, scholarly access, not uncontrollable access or uncontrolled access, but mind you, the project originators sort of had in mind something like shoveling it all on to Google so that it could really be universally accessible. I think we sort of curtailed that a little bit and shaped that a little bit. But that's certainly informed projects like, you know, the Conflict Records Unit. You know, I've looked at how you have done things with the Jihadi document repository. I've looked at a lot of other examples of, you know, efforts to make this kind of material available, accessible, to do it in a way that encourages scholarship rather than discourages it. And that, you know, contributes to the wider body of knowledge, open sciences, as you were discussing. I've got a ton of questions for you. I think this is really fascinating stuff and it's always really captured my attention. We've got one or two questions in the chat already. I just want to tell attendees, if you have any questions, just drop them into the chat box and I can verbalize them. Thomas, I don't know if you can see them. There is one question from Michael S. Smith II. How can data collectors contribute, which is of course a great question to be asked. I get a lot of requests for access to data from my archives. It'd be easier to shovel, for example, all of the MAC reports to a site like yours than to try to share on a per request basis. Indeed, you made the point, Thomas, that too many professors die with their private papers just not ever being put anywhere. And so creating something like the Jihadi document repository is a great space, but it comes obviously with a burden of cost and then management. And it's actually quite a bit more challenging to set up a stable, credible sort of repository. We started building one for the conflict records unit purely for storage. We haven't yet, you know, it's quite a ways off before we can get to the point where we're talking about a public facing, you know, portal where people can access and view material because that's a whole another layer of cost. So we've created a conflict records repository purely for storage for individual scholars within the King's College community to be able to park these repositories or for people working, especially in our case, we're very interested in really contentious materials, right? There's this contention that is attached to primary sources being generated by parties to conflict while that conflict is going on, access can be contentious, the contents can be contentious. And so if you're a researcher gaining access to this and in possession of this, that can create jeopardy of all kinds depending on what your local context is, what the law is governing access to the kind of materials. And so our view is confront this head on and create a space where this is doable. I don't know if that matches your view. I think the practical question for Michael is how can contributors contribute. And I guess my first guess would be to get in touch with you and to get in touch with us as well. But if it's more specialized stuff, definitely get in touch with Thomas. Thomas, do you want to address that? That's basically it. We don't have a kind of a formal mechanism or an upload page or anything like that. And we can't promise speed either. Especially now that we're in a process and the process of kind of reorganizing the site, but we are very interested in contributions like that, especially if it's a sort of near complete collection that kind of forms a unit. So I'm very grateful for Michael's offer. And we're happy to talk more about it. Let me say also that generally the problem in this domain is funding because this type of work is, as I mentioned, it's quite labor intensive and it requires maintenance. And as everybody in academia knows, it's really hard to get money for maintenance. Funders want novelty and often they want analysis. They want answers to questions. They don't just want documentation like this. And so it's quite hard to get institutions, funders to see the value of efforts like this. I mean, most people say, not to say, yeah, this is really cool. It's kind of useful. But when push comes to shove, when they have to take their wallets out, it often changes, the situation changes. So in another, I guess, observation from where I can hear, which I probably should have put in the plus slide, it is just that this is technically challenging. And I've kind of come to discover just how technically challenging it is and why we have library science, why we have dedicated specialists work on this. And I have the highest respect for that type of work. And I think perhaps we haven't been good enough at sort of learning from them and getting involved with with them and so on. So I think going forward, we'll try to kind of leverage that more. Although again, it is difficult because you often kind of you run into kind of institutional bureaucratic barriers and time constraints and so on and so forth. But bottom line is that if you're kind of a regular academic working on sort of substantive issues, don't think that you can just build a website. I mean, you are going to need the help of specialists. Those are really solid points. I've got two questions that I think flow from that. I'll put them both out there. One is you mentioned who's working on or who's accessing this. I think when I look at this, when I look at what you've collected, I see so many opportunities for research projects. Of course, we're part of the academic community, so I'm thinking in terms of academic projects, master's projects, PhD projects or what have you. Are PhD students, do you have students actively using this as kind of the either the core or adjunct to their research? I guess I'll ask that question first and then I've got a second question about one kind of project that really comes to mind when I look at some of the challenges you mentioned. Yeah, so what I can say is that anecdotally from the requests, I can tell that there are students who seem to want to use this in their thesis because when people register, they're encouraged to briefly describe who they are and what they're going to use it for and they're required to have an institutional email. And so I can tell just anecdotally from those sorts of requests that at least people, students are considering this, but I haven't been, I guess, very good at keeping in touch with our users and kind of following up and asking what did you end up using this and what for and so on. I guess we haven't been very PR kind of conscious. We haven't done nearly perhaps as much as we should to kind of do to sort of, yeah, to kind of showcase the applications because we could have maybe we should stay in touch one more with users, have them briefly describe the use cases and then present those use cases as inspiration for others and something like that, but that just hasn't been, it's not something that we've done until now. Sounds like it could be the basis for a research network, actually. Maybe a point that we can discuss offline at some point. It's something that we've discussed as well. Yeah, let's definitely, great, let's pick that one up later. My second question, you mentioned library science, that resonates quite strongly, of course. And one of the things that we've been looking at is the increasing attention to archival approach, not just with the work that historians do and that history does, but proper archival work, proper library science work, and not just in a sense of what high technology enables of that now, but the fundamentals of library science and archival science, which of course, emerged out of the, you know, the beginnings of open source intelligence work, you know, in the Second World War and the Cold War and after that, and it's got its origins, you know, there's a great deep backstory to this. And I guess to bring that around to my question, when looking at the challenges that you identified and the state of the collection, I'm just wondering if, you know, I mean, you can sort of see that one of the projects that you could orient around the jihadi document repository is improvements to jihadi document repository. You've got a collection of more or less raw documents that have got a little bit of organization, but need a lot more work that in itself is a research project if you were to sign, you know, a slew or a full run of a magazine or two, and somebody has to assign the metadata to this to help improve, you know, that that could be the kind of project that that could usually be structured. But I guess the question practically is when you're looking at scans of Arabic language or non Latin script, for example, that's more or where OCR technology and other kinds of scanning and reading technologies haven't caught up or haven't yet developed. You know, the obvious things to do with that if you've got the time and the money and the will is transcription and transliteration and translation. But before that, I mean, I'm just wondering about the metadata on each one of those images, for example, is there any, is there a part of the what you did as individual scholars before that material went in there to basically create, you know, a card catalog for each one of those items, is there some descriptive data about each file, the kind of thing that will improve the functioning of a repository like this in a way that, you know, archivists and librarians indeed would take this material and make it usable. Yeah, so the answer is no. I mean, there's no underlying kind of sequel database or anything like that. It's just links on pages. So, but it's, I mean, it's probably would be relatively straightforward to create, to extract metadata. You can, I mean, if you have access credentials, nothing stops you actually from scraping the whole thing. And, and you can extract metadata from the file name. So they're meaningfully named most of the time. But there isn't an underlying infrastructure that, you know, that is sort of lends itself directly to to sort of more automated processes. And that's one of the things that we want to address when I say we want to make it more kind of programming friendly. That's one of the, one of the other things I have in mind. So the, I think it's obviously necessary as the as the collection grows. Yeah, indeed. I want to ask you a question about, so you made some practical choices to make this a more workable project over the long term. And that's choosing certain kinds of material to include and to, and to focus on that primarily on text. And I just, I guess I'm wondering about, I mean, that's, that's a good practical choice. It focuses on what's achievable and what's doable and what's sustainable. But I'm wondering what, what, what sort of pain you might feel in terms of acknowledging all the other material that you're not including in there. And, and I guess by extension, the question is what do you make of this new landscape? There's not just, not just this, you know, extraordinary volume of material that's being produced across all kinds of formats, but the increasing the technology has now begun to develop. Indeed, as you point out in your foreign affairs article, to actually be able to manage that and to make sense of it and to extract meaning. So I'm just wondering what you make of that landscape, particular from the viewpoint of somebody who's a trained scholar and historian, where you can make those pragmatic choices about the kinds of sources that you're going to privilege because you can work on them. Yeah. So the big thing that happened in this, in the past 20 years in this domain is that we went, at some point, we went from a situation of data scarcity to data access. And so in the era of data scarcity, by which I mean the time when there was little information to be found about terrorist groups, you had to really kind of piece together a picture from really fragmented evidence. And, you know, if you're lucky, they had a maybe an early website and you could analyze that. And so I think that's why a lot of kind of prominent scholars in the field until recently have been historians, people who are good at working with scarce evidence or with this type of material. But later on, perhaps to some, arguably either in the middle of the Iraq war or certainly in 10 years later in 2013-14, we went to a situation where the amount of data just became completely unmanageable, completely. And we noticed because up until around 2013, we were kind of able, we had a sense that we were kind of, we had a sense of a bit of overview. We knew kind of where the main platforms were. We had a sense of kind of the main publications coming out of, you know, AQ type groups in a given month. We had, we had, we didn't have kind of a strong sense that we're missing a lot. That changed with social media and proliferation of new platforms. And I guess the point I'm trying to make is that there is now data overflow. And so maybe different skill sets are needed. And I think if you're planning to work in this domain in the future, you probably want to consider looking into programming and computational methods to handle all of this. So in some sense, the field is becoming more similar to other kind of social phenomena. If you want to study some other sort of phenomena where data is abundant, you have to have a sense of what to make of the amount, the sheer amount of data. And I'm not saying that everyone should be doing that, but at least until now the field of jihadi studies have been virtually empty of that type of work. And I think there's a big potential there for exploiting this, these data. So as you say, I mean, with these, with this amount of data in the use of the internet for propaganda distribution come opportunities. And in, so for example, nowadays, some of the main platforms for IS and ACU propaganda are on platforms that have API, so application programming interfaces, which makes collection super easy. You can basically have a script that just runs in the background and lifts every single, every bit and piece from that website automatically for you. In the past, you'd have to either do it manually or program a fairly sophisticated, say, Selenium script to simulate a human collector. So there are a lot of opportunities here as well. Yeah. Of course, this is almost all what we're talking about is the world of online information and digital information. I mean, there remains an entire other world that is entirely analog, that is entirely paper based. I mean, that was the basis for the Telemann sources project where we had to begin with paper material and how do we make it accessible to a broader community? Of course, we have to convert it, scan it, digitize it, make it available online. So there is still room for the traditional historian who's the gumshoe looking for paper, looking for lost archives or hidden archives or archives or what have you. Last question, I think, advice to students or researchers or investigators of any kind really, who are working alone, working as individuals, as opposed to somebody working within an institutional context, who can leverage the IT section to build them a repository or come with other kinds of institutional resources that allow them access that might not be doable for individuals. I'm thinking of the master student, the PhD student, who's starting to acquire materials and build up their own collections. Any kind of advice? With regard to how to handle the material, you mean? Yeah. If they're thinking about something, they've got a particular interest in a publication or an organization, and they want to start developing their own collection for reference, and then scholarly, I mean, for research purposes, for purposes of acquiring the material for analytical purposes. I suppose the same principle, I mean, I don't have something revelatory to suggest, but I guess, try to get an overview of what already exists out there so that you don't replicate or do the same work twice over. That would also help you identify gaps in the existing collections, places where perhaps you might contribute. And of course, there are tons of other considerations nowadays. So collecting data on the website today is a different ball game from five or 10 years ago, because there's just a lot more surveillance, a lot harder to get. You get kicked out of the sites by the site owners much more easily, much harder to navigate, and you're probably more likely to kind of get sort of on authorities radar if you do this. Now, that's generally not a problem, although some countries have become quite heavy-handed. Now, and not just towards Muslim students who I think have always been more at risk of unjustified kind of police interest, because the police are sort of suspected that there are sympathizers rather than scholars. That's happened on a number of occasions, but even like even non-Muslim bona fide researchers, I know several instances now where people have run into problems. And it's particularly, let me say this in this public, I'm happy to say that it's particularly in the UK, this has happened. People having their, basically their laptops and their mobile phones confiscated for months by British authorities because of the type of material that they've collected on the Holly websites. So it's an area to be approached with some caution these days. And so if you do it, I would say go to it openly, make sure that you're signaling your kind of academic objectives and credentials, et cetera, clearly, so that there's no confusion or a room for mistakes because that can cause problems. You raise an interesting point. And I think it's an interesting detail, which, you know, when we talk about the Taliban sources project or repository. And one of the reasons why it's now on University of Oslo servers and not somewhere else is precisely because of the kind of, I guess, lack of understanding about why these kind of repositories are created, these collections are created and the potential uses for research and how legal atmospherics can create jeopardy for researchers, regardless of how credible or institutionally based or whatever. And indeed, transparency and openness about why you're doing the research as an academic that may not satisfy others, journalists or other kinds of researchers, but for academics, it's certainly a good start point. A connecting observation, I think, from a member of the audience, I guess I'm going to paraphrase slightly, is do you think researchers need to have a certain degree of civic mindedness when they're doing this kind of research? Do you think there's any kind of default forensic setting given that some of the material they may be looking at, particularly if it's in near real time or connected to ongoing conflict, may have, you know, implications for potential prosecutions, not just for terrorism, but for other kinds of criminal acts? Do you think there's a need to de-conflict to use an intelligence term? With, I'll read the question here, with governments when conducting research targeting jihadist primary source materials online. Do researchers need to register their interests with the government? Do they need to be that transparent or is it enough just to be open about the scholarship that they're doing? What do you think? And I think we'll probably try to have closed after that. Yeah, well, no, I don't think there's, I don't think any country has a mechanism for that. But that doesn't mean that, you know, that is not a good idea. If you're in a university setting, that's kind of handled for you in the sense that, you know, if you're doing a master's thesis, you will presumably, you know, leave a paper trail, you know, there'll be documents saying that you're doing your research on this topic, you'll have a student card, a set of blah, blah, blah. If you're more an independent researcher, you may have to find a way to sort of signal the same thing in other ways. So, but by and large, I'll say that it's not as dangerous as sometimes people think, some people get sometimes questions from just like many laypeople, you know, isn't this super dangerous, you know, isn't the intelligence services kind of, don't they have a drone outside your house, things like that. So no, I think for the most part, government monitoring of these traffic to these sites, you know, I so kind of sophisticated now that I think they can tell false positives apart. So, and they are, you know, brutally speaking, they are after the kind of genuine bad guys and not others. So, so it's not, it's not a, it's not a complete minefield. But there are things to bear in mind, and there are also, of course, ethical issues here. You know, you should definitely not, I think, contribute actively to the forums. You should not, you know, you should be, especially if you're coming this from an academic standpoint, you should strive towards passivity, you know, and just observe. And also, incidentally, you know, the moment you act, you're active and write things, perhaps to gain access stuff like that, then you make your life, you can make your life more difficult vis-a-vis precisely this kind of communication with or the conflicting with authorities. If you are contributing, you are in fact a part of the phenomenon that you're supposedly just there to observe. And you kind of, you tinker with the, your lab product as you were, you're influencing and distorting kind of the scientific process in some sense. So you should avoid that. Some good basic methods in terms of, you know, drawing the line and making sure you're not influencing the thing that you're observing and getting involved and by extent being implicated in whatever might be going on in the thing that you're observing. I think, you know, a lot of this, just a closing comment from me, I won't ask any more questions, is that when we're talking about online, we're also getting into that world of, you know, digital open-source intelligence, right? It was a commercial side or aspect of what, you know, academics might be doing and just calling research. And of course, there is good practice in terms of taking some basic measures to preserve, you know, your personal security online without actually sort of, you know, disabling your own work by sort of putting too many obstacles in front of your ability to do that research online. You can see different levels of sensitivity from different researchers in different sectors in terms of the tools they'll use to mask their identities or what have you. And I guess it's an open question in terms of academic research, how methods, how much of that should be practiced on the academic side. But I don't have an answer to that. I don't have any personal observations on that. But it's probably a good thing to think about if your researcher is how much, you know, how, how, how much value do you place on your own privacy regardless of who you think might be monitoring. But also when you're engaging with, you know, communities who may be in various kinds of sort of legal, legally questionable sort of circumstances that you need to be quite careful in terms of how you access, including opening up your browser and accessing a website, because that will leave a trace that will influence your own profile. So those are things for researchers to consider. Thank you for the comments in the chat. Thomas, we went a little bit over time, but this is really great. I hope it didn't turn into too much of a personal conversation between the two of us. But I'm looking forward to picking it up at some point and carrying on with this. It was a great pleasure. Thanks for contributing to this speaker series. And I'll give you the final word if you have any more that you want to add. Thanks very much for having me, Marco. It was a pleasure. Thank you. Okay. All right. I'm going to just end the call entirely, and that will stop the video I'm told. So thank you and good night, everybody. Thomas, I'll be in touch.