 Buenos tardes, so good afternoon. My name's Nikomi Bentley and I'm from a small company called Stensilla, based in New Zealand, and we make software for dynamic data-driven documents. So this is the original title of my talk. Sorry, this is the original title of my talk. It's sort of rhymes and it's got some nice acronyms in there but it's quite a mouthful and maybe a bit click baity. So I came up with another title which is Making Stochastic Parents Announce Their Present. So hopefully as the talk goes on the reason for this will come apparent. So as I said I come from New Zealand, here in Christchurch, New Zealand. A lot of people have said I come all the way from New Zealand to come to Buenos Aires all the way over here. It's a long way to travel and yeah if you look at this map of the world like we're used to seeing it looks like somewhere like San Francisco or London is much closer to Buenos Aires and all the way over here. But if you look at the world from a different perspective based on the South Pole you'll actually see that Christchurch, New Zealand where I come from is actually quite close to Buenos Aires and to get here I just flew across the Pacific didn't have to go all the way around. And I think these two maps quite illustrate one of the concerns around the use of generative AI in research. One of the concerns there is that if all the training data that AI models are based on is in English and has an Anglo-American centric bias then maybe the answers that we're getting from generative AI are going to be biased similarly. So I'm going to be talking about some of the benefits risks and roles of generative P-chain transformers and other generative AI in research. The first of the things I'm going to talk about really are some of the risks involved in that and the alternative title of my talk Stochastic Parrots owes to this paper from Bender et al. in 2021 where they first coined this phrase and this paper goes into a lot of detail about potential risks we just saw in the previous talk in this room from Adrian some of the risks in terms of exploitation of workers the labelling the data sets are being used these models being used to train on but in this talk I'm going to really focus on some of the aspects of the risk to research and as many of you will be aware one of the risks in using GPTs generative AI and research is that they can be confident bullshit artists. So I asked Jack BT GBT the same question as which I posed originally about which city is closest to Buenos Aires London San Francisco Istanbul or Christchurch where I come from and it confidently replies that the closest city to Buenos Aires Istanbul actually it's the furthest of them all and if you look I've done some fact-checking here the numbers in terms of kilometres are wrong and in fact according to the numbers that Jack GBT gives us San Francisco should be the one which it says is closest so there's a whole lot of things which are wrong here and obviously for this particular question we shouldn't rely on Jack BT GBT to do our research but there are a lot of benefits that people are discovering and the last five years seen five months I should say has seen a big explosion in the use of GPTs and particular through Jack GBT that's been released recently a lot of you will have tried it and found the benefits that potentially it brings to increasing your productivity these are some of the things that personally I found Jack GBT to be useful things you know it's a better ad-free for now search engine for certain topics things like unblocking writer's block just getting started with writing improving first drafts things like for researchers whose first language isn't English being able to do translation for instance to translation text into scientific English so despite the risks and the problems with GBT I think the genie is out of the bottle so to speak and the we're going to see these sorts of technologies be used in research more often and so we are seeing that uptake already and concerns around Jack GBT being listed in research papers as authors paper fabrications so people using AI to generate false research papers and so forth and so some of the responses to this have been things like what we might call a detect and ban approach so there are certain techniques such as AI detection algorithms and watermarking which can be used to detect text that's been generated by generative AI and you can see this study that was published last year the original abstracts which are written by humans this is AI detection software does quite well in classifying most of them as being having zero percent AI detection score whereas the ones that were actually generated by chat GDP have most of them have scores but it's not perfect and we're still these algorithms are still missing up on some of the AI generated content even despite this the detection algorithms we're finding that slight modifications to the wording so you get your chat to your BT text to make slight modifications to the wording and these AI detection algorithms start to fail and form much worse so a more nuanced approach rather than saying just detect and ban and more nuanced approach is to say well we'll allow you to use AI for polishing your work and so rather than just copying and pasting from Jack BT Jack P teach I have a problem saying that GBT into your into a scientific article will allow you to use it for editing and that raises the questions of how do you draw the line between editing and writing this is a sort of more nuanced approach that we already have in place for the roles of human authors in research so you might be familiar with something called the contributor roles taxonomy which is 14 roles that can be used to represent the roles typically played by contributors in research outputs and so we have things like conceptualization funding acquisition methodology visualization writing the original draft and so forth and maybe we need to start to think about how we can formulate some roles for generative AI in research things like editing work taxi have already written translating summarizing generating code for instance in dynamic documents such as Jupyter notebooks you might want to want to use AI there but making that explicit and crediting the roles of different not only humans but also AI is in that process so I've talked then about GBTs and and generative AI in general but the other technology that I want to talk about today is conflict-free replicated data types and these are a technology that's emerged just in the last decade or so and they're really seen as an alternative for decentralized collaboration that similar to using Google docs but allowing us not to all rely on Google servers and centralized collaboration where Google control our data and there's a group of researchers including Mark Kletman from the University of Cambridge and others who have come together under the Incan switch organization that have really led a lot of the work in developing these algorithms that allow this offline collaboration one of the important things about CRDTs is they allow synchronization of really highly structured documents so here's a to-do list and you might have one user editing the to-do list or one machine and then on another device completely offline not requiring a separate server you've got another person using this editing this to-do list and it involves not just text but also brilliant values and arrays and so forth and when there's a network communication available when someone goes online these two versions can sync and you get the same result in the end and then another important aspect of CRDTs for today's talk is the fact that they allow for really fine-grained version control and branching so some really recent work done by Carissa McKelvie and other people at Incan switch looked at how you could use CRDTs to do get style version control and merging and branching and so this sort of illustrates the way that I see that we might be able to combine these two technologies CRDTs and GPTs so can we use CRDTs to track the provenance of research content including that content that is created by AIs so this approach could be labeled trust but verify approach in that we're going to say that let's trust authors to use generative AI responsibly they're not going to just ask it to write a scientific paper for them they're going to use them use it to help them edit their text and just make themselves more productive we want to move away from a copy and paste of generated text into a word document and instead move towards treating generative AI as a contributor in the paper that we explicitly acknowledge the role that it's had and we do this by automatically recording the prompts that we use to ask the generative AI assistance to help us and then record any fine-grained changes that the generative AI has made so we can see we're in the document AI has been evolved and exactly what changes and what words can be attributed to the humans and that what words can be attributed to the AI so what would this look like so you might be an author on a paper and you've written a paragraph that it's about the methods that you used and then you ask the AI to improve the wording of this paragraph make it more concise and so the AI responds sure here are some suggested changes and it's removed some words and it's added in some words and instead of this just being a blob of text that we're going to copy and paste into document what we're going to do is actually review and accept some of those changes so the AI becomes an assistant in the process and we can say we're going to accept or reject some or all of those changes and using what the version control aspects of CODTs we can then have a history of the document including where AI was gone through and edited a particular paragraph where we can record the prompt that we used and who reviewed and merged it in and so just like you would use a get log to get a history of changes to a document we can use the CRDTs to generate the similar summary so this is what this is an approach that we're exploring at the moment with Stanciller we have a all our documents are represented by JSON data so this is similar to what I showed you before with a to-do list CRDTs allow us to represent these documents as these version controlled repositories and we can then incorporate artificial intelligence just like we do with normal users where we can record their organization and their first and given name we can record potentially the version the name of the AI that was used and so forth if anyone's interested this is a link to the repo you can see what we're doing there on that now and then the advantage of that is that when we publish this document is HTML all that metadata about the affiliations including potentially the AI can be published to the web and I'll just finish off with Maori proverb that links back to my alternative title for the talk as the sheer water announces a presence as a parent announces presence so too do I thank you thank you so much for that Nikome we have four minutes for questions does anyone have a question they would like to ask yes I will come to you hi thanks that was super interesting I was wondering if you're trying to encourage scientific authors to sort of record and report the use of AI do you think there are certain policies or rules that will have to be written and enforced by journals or other organizations to make that happen sort of like the trust like how how much are we trusting versus requiring yeah I think so I think it's probably a combination of both like I mean if you require it then you've got to put in some sort of way of detecting it and as I've shown that's not foolproof and there's it's like we're spam you know there's going to be this war against it and they're going to find people going to find ways around it and so I suppose this approach as I'm advocating here is let people use it but make it easy for them to record how they've used it and so reviewers can go through and look at the document history and say oh well they used it to just make their work more efficient and they can see exactly that it was original research with just the help of AI assistance so I have a question so this reminds me of you know other types of software that researchers and authors are already using can you tell us more about how that is currently being like recorded or recognized are there you know you know like other tools like a Trello or a Github or something that people already use to help them in these processes are those acknowledged currently yeah so I think with what we're seeing in the emergence of AI in the recent years there's an opportunity to sort of build something from the ground up in a sense that takes these factors and these issues into account when you use a tool like lots of people researchers are using Github these days and are used to this sort of history so this isn't a far jump from that but it's just a way of avoiding that copy and paste mechanism where people are using Google docs or Word and so forth and just copy and pasting generated text into their documents yeah that makes sense yeah one last one okay how do I start right I know that everybody's been talking about GPT you shouldn't use it for generating text X and Y but I mean did we suddenly just forget about tools like Grammarly spell checker would you just click one button and it changed things you know like what's your take on that because it's like I feel like last year everybody was talking about Grammarly how it stops your style it changed your hard document and now we just nobody's talking about it and everybody now just talking about chat so yeah and I think what's the difference there yeah I think it's a really good point um yeah all of the things have the potential to influence how we write and and what we produce and so yeah I suppose the rationale behind this approach is let's at least record that and make that explicit of what tools we've used and how we've used them so that we don't need to be second guessing you know so that reviewers don't need to be going oh this person's just generated this or not we can actually say it yeah thank you in a comment can we have another round of applause please