 I'll paste this into the chat, and I'll paste it again. So I just pasted the short link to these slides in the chat for folks joining after the fact. It is pos.it-posit-rstudio-copilot, and that will get you to a copy of these slides that will be up in the future. So yeah, thank you so much for having me. Excited to be here and talk about new things in our studio and integration such as getting a co-pilot. I joke that this could really be a one-page presentation. In terms of get a co-pilot is in our studio now, it's finally here. Kind of mic drop, walk away, that's the end of it. But of course, like any tool, you need to learn a little bit about how to use it best and be productive with it as opposed to just use the tool and kind of not know how to use it the best way. So this is a talk that officially closes one of our most highly requested features ever, get a co-pilot integration with RStudio, which is issue number 10148. It had over 500 upvotes and was again, the most popular feature request we've ever had for the RStudio IDE. So really excited that we're able to deliver this in the most recent release as a preview feature in RStudio 2023.09, which just came out about a week and a half ago. But before we kind of get into co-pilot and just saying it's available, you know, that's great, but we need to kind of figure out what is co-pilot and what does it mean in a broader sense of generative AI or large language models and what does it all mean together. So let's first talk about what's generative AI. Generative AI is a set of category of models and tools designed to create new content such as text, and that's what we'll be using for co-pilot. But it could also be things like images, code. And generative AI is ultimately using a variety of techniques to identify patterns and then generate novel or new outcomes based on them. While I was preparing for this talk, I used a different AI tool called MidJourney, which is used for creating images or graphics. So my dog, my old boss, interior Howard, is my little co-pilot. It's always with me as I'm coding or doing development or managing products here at Posit. And so I use this prompt or this kind of set of text, sent it over to MidJourney and said, hey, give me back an image that is described by this. And if we look at it, what we're really asking for is a seated robotic Android boss interior wearing pilot goggles. So this is my little co-pilot buddy. And so in the span of 10 or 12 words or so, I can get a remarkably complex output out of that. And this is kind of the promise of generative AI is taking something small and all this context that it was trained on and creating something new for you to use. Now, well, images are really cool and I love this image. I can use it in the future. What we're here to talk about though is more about generating text or, in this case, generating code as a specific type of text. So generative AI for text, what it wants to do here is really just predict the next word or the token or a string. So you might have, say, like an iPhone and you're typing out and you see at the top of your keyboard that it's inputting like, oh, it's predicting the next word. You can kind of at a basic level think of generative AI for text as a super powered version of that. So I might go to ask something like chat GPT, hey, complete the sentence every good. And so it's like, okay, well, what does every good mean? And it comes together and says, well, the next word after every good is thing. And then it's must come to an end. So as it builds up this context, it starts with just complete the sentence and it's very good. And it has kind of a lot of variation about what the next word might be. But then as it gets more context about what it's predicted, it gets to these high, high probabilities of it is predicting something down this very specific path. So this general idea of take something small, keep adding to it and predicting this next token or this next word or this next snippet of code. And then building upon the context of what's come before it is largely how some of these tools can work. And co-pilot uses a similar approach to generate code and not just text. As far as what co-pilot is, if we ask the co-pilot developers in their documentation, GitHub co-pilot is an AI pair programmer that offers auto complete style suggestions as ghost text. And this is based upon the context of the surrounding code that is in your script or that it's been trained on. Now I want to differentiate a little bit because if you think about our studio, it's had auto complete for forever. It's got rich auto complete and helps you do things and be more productive as you're typing out your scripts. Ghost text is a new type of context or a new type of way to create these predictions. So we think about just auto complete, it's parsing the code and the environment. It'll take that code of where you're typing exactly where you are and supply a list of possible completions, right? Like they have to be possible based upon the characters you've typed so far. And this is a static set of completions that has a little pop up. And it's provided from your IDs, literally provided from the computer you're working on or on disk. If we compare and contrast that to co-pilot, co-pilot will still parse the code and the environment. But it has billions of examples of training data and billions of examples of code that has been trained on that can be incorporated into the prediction, that next token or that next script that's trying to spit out. And importantly, it's supplying a list of likely completions, right? They don't even have to be possible. It could be something completely outside of your script, but it could be helpful for your problem. And it's not a static set of completions, but rather a dynamic set that's kind of non-deterministic and is prepared and delivered via ghost text as opposed to a pop up. And ultimately, co-pilot is calling out to API endpoints. And it's a generative AI tool provided via that API endpoint. Now, talking about co-pilot, sure, that's very helpful. And it says, okay, this is what the difference is. But ultimately, it helps to show you what it looks like. So if we think about autocomplete versus co-pilot, autocomplete might look something like this. I have an R script that's in RStudio. I start typing mean, and so I type MEA to take the mean of this value. And it will do this autocomplete pop up showing me what's possible. If I compare this to co-pilot, it'll not only complete what's possible, but also the context within the script. So we look here on line one, I have a comment. Take the mean of the MPG column in the empty cars data set. So here, with only typing one letter, M, it automatically not only completes the function I'm trying to call, but also the data that I've indicated I want to use. So this is a possible or kind of a example solution it might give. And it's doing more than just autocompleting the text that's available. And if you look at it, this is ghost text. That's a little light gray differentiation here. If we look at this, the first letter is one I've typed. And then the remainder of this word is the ghost text that pops up. So this ghost text, if we look at a bit longer example, might look something like this, where in the same script, I've said, well, I don't want to just use mean and a dollar sign to get my column. I want to group by another column and use dplyr to do that. So in this case, I'm providing the context of a comment about what I'm trying to accomplish. I've also written a one line of code, which is just loading the dplyr package. And then co-pilot will generate ghost text trying to solve the problem that I've prompted it with, or this context that I provided. So from line five on, this is all autocompleted from ghost text via co-pilot all at once, so a multi-line output that it's providing. So it's not just limited to small, short snippets, but can do a little bit more complex. And importantly, this ghost text doesn't actually exist in the document until you accept it. So if you don't like the output that it predicted, you can back out and start typing something else and get another prediction out. And you can keep working or write your own script and ignore what co-pilot is giving you. Now, ultimately, again, while this is cool, we've got a little more context. I do want to at least show you one example. So here's a live one. In this example, I've got a little bit longer script. And I'm, again, giving a little bit of a prompt here. And then it creates the function for me, and I can immediately call it. Again, it's multiple lines it's predicting out, and then I call the function as it comes out. So I provide a scoped and specific prompt here on line six, which is that comment. And then the additional lines of code from one to five. And then I receive ghost text autocompletion for the lines below that. So line nine on is the mode function I've asked it to create. And this is a lot of different things. It's functional code that actually works. And inside our studio, we also have the status of the request in terms of co-pilot is thinking or it's sent back a response that we can then work with. So altogether, this ghost text is something new. And I wanted to explain a little bit about how it works. And we'll use this concept throughout co-pilot. It's just how it's built into the IDE. As far as this kind of scope to specific prompt, we're trying to say, what is the information that we're trying to bring in or what is the problem we're trying to solve? In this case, a prompt is really just context. And we'll talk a little bit more about what that means later on. So ultimately, I want you to get started and kind of try out co-pilot. So if you want to try out co-pilot and we'll go into some more deeper examples throughout this, you can get a subscription to get up co-pilot personal or for business. Importantly, co-pilot is a paid third party service. So you do have to have a subscription to use it. And once you acquire that subscription, you can activate it within our studio by going to tools, global options in the co-pilot tab. And it will walk you through downloading, installing the co-pilot agent, and then signing in with your co-pilot account or your GitHub account. So go in there, you'll click on sign in here in the co-pilot tab. And then it'll take you through an off flow where you drop in a verification code, you authorize co-pilot to access your account. And then it will sign you in as your username. I'm J. Thomas Mock on GitHub, and I'm now logged in as myself to work with us. So at this point, I'm ready to use co-pilot and we can get into a little bit more of a deeper example of how co-pilot might be useful or how we can solve problems with this in addition to writing our own code. Now with any new tool, I think it's important to play around, but play around with a purpose. So this is like, I'm not trying to use it for every single thing possible, but I'm trying to solve a specific problem, play around with this new tool to see like, how can I break it? How can I use it well and kind of do it in this kind of fun environment where I'm not afraid to mess up and I'm not breaking anything? So for me, I wanted to play around for this talk by solving a game called Keyword. Keyword is a word game. It's popularized at least within my family by The Washington Post. And it's kind of like a mini crossword, but you don't have any context about what the words are, right? You just have the vertical words that are missing a letter, and then these all spell a horizontal word across. Now, to solve this, you might eventually end up with solving it with typing out felony to get your flimsy, stare, kale, unto, seen, sway. Like those are the words vertically and then it spells horizontal across felony. And we can solve this a couple of different ways, you know, just trying to guess and saying, oh, it's wrong or it's right. Or we could, you know, try and write out all the possible letters and eliminate them before guessing. But ultimately, because we're trying to use Copilot, we're going to solve this puzzle with our studio and Copilot. That's what we're going to use for today. So I'll use this kind of open-ended task that's a little bit fun to talk about how you can be productive with Copilot and make use of it in your daily life. Ultimately, to be successful with something like get a Copilot, we have to get the most out of this generative loop. What I mean by that is that, you know, the Copilot or other generative AI tools require a little bit of guidance to be successful. So Copilot is another generative AI tool and it can predict output text, or in this case, more specifically code. But importantly, generative AI doesn't understand anything. It's just a prediction engine. It's just trying to get to that next word and continue finishing out a sentence or a script. So in this case, to get the most out of the generative loop, we can think about it in three different ways. We have the context, which is what prompts or what code or what comments have been provided. Basically, what all is inside the script that it can use. The intent, which is what I actually think in my head of what I want to do, but I have to codify that or write it into my scripts to get the most out of it. And then I have the output, which is actually what Copilot will return in this example. So these three components in my mind are the generative loop. I have the intent initially about what I'm trying to do, the context about what prompts I then provide to Copilot, and the output of, again, what is actually returning. So to get the most out of this, I can try and supply better context, which means it gets closer to my intent of what I want to do, and leads to a better output or a better prediction to help solve my problem. So this is really what I meant by scoping the problem and a prompt. The prompt is really anything in your script, like code snippets, names of files, names of data frames, little comments throughout, and other things that have been loaded in the environment. To simplify this even further, I have my little buzzword for the day, which is S2C, or making your problems simple, specific, and use comments throughout your document. Using this approach, you can generally be productive with tools like Copilot, and allow it to be closer to what you're actually intending to do and get better outputs. So in this case, again, our toy task that we're trying to solve today is how can we solve the keyword game with R, RStudio, and Copilot? So we'll take this complex task and break it down into simple and specific problems. So I might provide a high level description of the project goal at the top level, and then build off that with more specific tasks. For example, I know how to play keyword, and I know you can solve similar word games with R. I've seen other people solve games like Wordle with R. So I might write out a long set of comments like this. And this is my first prompt or my first context that I'm providing to Copilot. And it's also useful for me. It's telling me, hey, this is what I'm trying to do. So I'm creating ultimately a function to solve the keyword game. This is a six letter horizontal word, and it's got the intersection of those other vertical words. And we're trying to find each of the missing letters from those vertical words to spell the horizontal word. And then I say how to play guess six letters across, and the letters must spell real words up and down. So this context, this is still a pretty open-ended problem, but this is just me setting the initial stakes of what I want to do with my overall script. So we'll go a little bit further, but I think it is very useful to supply a lot of detail up front. And then you can again break down more of your problems. Now the simplest way to solve this problem is to be clever. In this case, we'll cheat really briefly and then get away from it. So I know that keyword is run through a JavaScript front end on Washington Post, and I bet that there's some data there that I can look at. So I can get the URL for a specific date, grab the JSON data from behind the scenes, and then print that out. And then I can get the answer. So the answer for this day was staple, and then I solved it immediately. But that's cheating. I don't want to be that simple or that clever. I actually want to solve the problem with R. So while the answer is available in that JSON data that's available from the website, I can also find something that's more useful for actually solving the problem, which is the words with the missing letters. So here we have a underscore indicating the missing letter and then the rest of the words spelled out. So if I use staple, it would be C, chant, bear, spur, really scale. So those are the words I'm trying to solve. And if I have two or three or four of the letters, I can probably try and guess the rest of them with R and try and figure out which words are possible. So this is my first kind of break down the larger problem into simple steps, which is solve each of the individual words. So in this case, simpler is not cheating, but just get the hint words. So let's break it down into the component parts and work with the hit words. So here I have a long kind of set of comments. So in this case, six lines of comments. I'm telling myself and co-pilot, hey, this is what we're trying to do. I'm creating a function called JSON URL that takes a date and then responds back with a structured match to this URL to get the JSON data. And as I start typing out JSON underscore URL, co-pilot will provide this prediction of saying, hey, here's the function we think you're trying to write. And it provides a useful function by providing all of this context and what we would call like a prompt or kind of the context we're supplying to co-pilot. I get a very useful output out of pasting the date into this URL so I can really solve it for any day, not just one specific day. So we've solved step one, which is getting the URL and then we can use that to get some of the additional words out of this. So we just wanna get those hint words. So I can do this manually, I can start typing out, get the JSON, get the hints from this, extract the words, vector from the JSON blob and print it out, great. But I need to actually work with them. So we'll ask co-pilot another question or ask co-pilot to help us out with another problem. So in this case, we'll be very specific. So we have simple, break it down into a smaller problem and specific. In this case, we're gonna use expressive names and comments alongside variables, functions and other objects to basically use as another type of prompt. So here I have a shorter set of comments given a word like underscore hack. Return a regex that will match the no letters or replace any lowercase letter for the underscore. So as I start typing out regex, it'll give me regex from word. And so if I supply a word, it will replace that underscore with any lowercase letter. And then spit it out as a regex that can be applied to a database of words. I'm using a Mac, and I know that Macs actually have a database of words built into them, so I can supply a whole bunch of words and filter it down into the ones that actually match this problem. So now I not only have the hints, but I have a regex I can use to filter out my words and try and predict. But really, the idea here is that with shorter comments, but more expressive names or more expressive problems, I can get a better response from Copilot. Now, where I can go further is after defining regex from word, I can use it. And here, without using any comments, but typing out matched words, Copilot can then take that and say, well, this is an expressive name. I bet he wants the matched words. And so it uses the string R package to subset, uses another function I've defined called limit words, and then applies the regex against that. We can basically read this as subset the entire database of words to only words that are four characters long and are possible to match underscore act. So you might think of like back, lack, whack, all these different things that match that. And so this gives me my subset of matched words that we can continue working with. But importantly, while we've been writing lots of comments above, we've got more context built up. And again, Copilot's reading the entire script from top to bottom. It's not just relying on what you have here. To use the expressive names a bit more, I can write a little bit of being specific about the comments and what I'm trying to do to get the top 50 most likely words. And then I start typing top underscore words and it gives me a function that applies that matched words that I just defined. It basically rolls that into a function and then does a whole bunch of different steps to basically apply the splitting of these characters, scoring them around which ones are most common in the English language, sending a name, sorting them, finding unique words, and then returning the head or the top 50, which is the default value for this function it's created. So again, these are simpler problems, but they're not trivial in the sense of like this is a good amount of code that has been predicted out. And without using any comments, I might do something like, oh, I want to get the letters from blank and to start defining that as a function. And it gives me, hey, for all the possible words that are there, just return the letters that would be in the place of the blank empty letter. And then I can use that to say, okay, what are the horizontal words that we're predicting? And at this point, I've really solved most of the problem. And I've used co-pilot along with comments and a very small amount of my own code to solve this big problem. So overall, I'll wrap this into a function called guess keyword and then use it for a specific date, in this case, one month after August 9th, which was the first time I really was playing with keyword. And it says, hey, the keyword is one of the following words, recipe or re-pipe. And as much as I love pipes, I'm a tidyverse fan, I love that base R has its own pipe now. I don't know if re-pipe is a real word, but it's in my database. So I bet the word is recipe, and that actually was the word of the day. And so it took all of these words, guest, the possible letters that were there, and then limited the guesses to ones that actually spelled a horizontal word. So this was kind of a fun problem. I played around, it was low stakes, if I messed up, fine. But I was really just trying to see how far I could get with just trying to prompt and minimal use of my own code. You can kind of think of these comments as setting the intent of what you're trying to do, providing this context, and almost like describing pseudocode that you want to write out. I do have a link out to the full transcript, or you can imagine the full thought process as I went through this in a raw R script. And then I also have a link out to the full guess keyword function, if you want to try it out with other kind of examples with keyword. So welcome to take a look at that. So while keyword is fun and solving problems with co-pilot is fun, ultimately there were times that I got a little bit stuck in terms of the problem was very broad and I needed to kind of like dive into the problem a little bit differently. Again, the best way to solve these problems if you're trying to get enhancements or help from co-pilot is to add a bit more context, do more comments, do more code, do more in your script. So add more context, follow that protocol of simple, specific, and use comments, S2C. Break down the problem into simpler problems, solving a very specific task, and use comments to help describe what you're trying to do or get. Another way to work with this is prompt again in a different way. So you can imagine that if it's the top level, if I'm trying to write a function, typing out the name of a function might be helpful beyond just using a comment, or from trying to add a function into like a deep layer pipeline, I might write an inline comment to help scope the problem or prompt it in a specific way. And these adding of top level, meaning like at the farthest left or inline comments to your other code is a great way of again codifying or writing down what your intent is for your script or for your problem that you're working on. And ultimately you need to build off your own momentum, right? Like you're going to write some of your own code. Copilot's not replacing who you are as a coder or as a developer or whoever you are, it's just helping you write some of your code. It's helping you be a little bit faster or solve problems that you don't necessarily have all the answers to, and it can kind of guide you in a direction. And maybe sometimes you turn off copilot for a bit, right? Like sometimes I'm like, I'm in a flow state. I know exactly what I'm trying to do. I don't want copilot for right now. But then when I get stuck, I can turn copilot back on. So in our studio there's a command palette or what's called command shift P or control shift P on Windows that allows you to open up this command palette and you can turn copilot on or off really, really quickly. And ultimately, don't forget that the goal of what we're trying to do here is provide better context again with our simple, specific and use comments to get closer to the intent of what we want to do and get a better output. So we're not done yet, but I just wanted to say we solved this problem. We've talked a little bit about getting stuck. And so let's keep going with these ideas. And ultimately what this means is there's more than one way to generate text, right? There's ghost text, which is really cool. You provide like a little comment. You start typing out the name of a function, and it gives you a nice clean function. So calculate the circumference of a circle. And you can say, well, if you supply me the radius, I can give you two times high times the radius, and here's your function for calculating circumference. But maybe you want to ask a question. Copilot's really good at generating code. Not great for answering questions because it's not really intended to be used that way. So maybe you want to ask a question of something like chat GPT. Or maybe you want to say, I got stuck with an error. Explain this error to me. So chatter or the chat R package is an R package as an interface to a bunch of chat style APIs or models. So you can think of like chat GPT from open AI is a very common tool for chat bots or chat style, large language models or generative AI tools. But there's many different types of these. What chatter provides is a way to call it from our code and interact with these. Or to actually have a chat style interface in our studio via the viewer pane. So here I might say, how do you calculate the circumference of a circle? And rather than just giving me a two line function, it says, hey, to calculate the circumference of a circle, you can use this formula. And then it shows me how to do this with R code. So you can approach the problem in a couple of different ways and both types of tools are helpful in their own right. With chat GPT, there are different plans. You will have to sign up for an account. But there are, again, open source models. You can run it like your own laptop. And chatter can also help you run those models locally and interface with them. So the chatter R package. Again, I like using it as the chatter chatter app, which allows me to call it and display it in the viewer pane of our studio. And so this allows you to not only can use co-pilot for generating code, but you can ask questions or get answers back to these problems you're trying to solve. And you can use that to, again, help you be a better coder, even if you don't want the predictive text inside our studio. Chatter does something really cool in that it's doing what we call enriched requests. So I don't want to go too deep into all of this, but you can load the library chatter. You can attach some data sets, maybe empty cars or the iris data set. And then you can say, hey, send a request out to chatter. Or in this case, to send it to chat GPT with the GPT 3.5 turbo model. So this is what it actually sends across as a prompt or as an ask to the model. It'll say, hey, you're a helpful coding assistant. Use these books, so tidy modeling with R or R for data science. Use some tidyverse packages or the tidy models package. And a couple other things to say limit the response to make it a little bit more efficient to use. And then it would inject your actual question here, but enriched with the rest of this context that is always making your questions better for doing R related tasks. So ultimately, when you send a question through chatter to one of these back end models, yeah, you have the question, which is the actual thing you're trying to solve. But in addition to your question, it's enriched with the path to data files, the data frames you have loaded in the environment, some additional prompts like use these packages or use these books, as well as the chat history, if you're using chat GPT, of what you've asked before. So it has all this context that it can use to submit to the large language model and then give you a nice response back in your IDE to actually solve problems for you. So that's kind of a summary of the couple different models you can use in our studio. You have chat GPT style chat interfaces with chatter. You have GitHub co-pilots available inside our studio for ghost text. And again, if we want to be successful with these tools, we want to follow our simple, specific and use comments or S2C. Importantly, GitHub co-pilot is an optional integration. So if you don't want to use it, you don't have to use it. And it's available as a preview feature, essentially like a public beta in the 2023.09 release of our studio and workbench. So this is available. You can enable it if you want to, and you will need a subscription to activate it. If you have feedback or you've run into bugs using co-pilot, you can open up a GitHub issue on our studio repo. And don't forget about the chatter package if you want to do more of that chat style interface with either remote APIs or even locally hosted models such as Lama. And if you have other backends, you can always open up an issue to ask for, how do you interact with that? So to kind of close out and then we'll get into some live examples, and I'll show you a little bit about how it works inside our studio, you know, I've got a couple of images of different co-pilots you might have. So maybe you have a cat that you're really close with and that's your little co-pilot or a dog, or maybe you're a fan of Totoro or other kind of mystical, mythical creatures that you want to work with. And these are all generated with an AI tool called mid-journey. So that's the end of the slide. I do have some other slides we can get into if people have questions. But just briefly, I do want to show a couple examples of actually using co-pilot inside our studio. So importantly, I'm going to go in here, I've got co-pilot loaded and I've got our studio loaded here. I'm going to start a background chatter app, which is again that R package for running a large language model interface inside of our studio. So now here in the viewer pane, I have a place to ask questions. For chatter app, it will run as a background job if you have that set. So I have it set to run in the background so I can still use my console to do other things without interrupting the model or the model interrupting my console. So here I have a couple of kind of open-ended tasks that I want to work with and I'm going to use co-pilot to initially solve them and then ask maybe a question or two of chatter. So here I have kind of a base example of how can I repeat common tasks with co-pilot. So maybe I'm subsetting a bunch of data or vectors and I'm trying to grab them one by one. So I have this empty car six-cylinder, which takes the empty cars data frame, subset it to say, oh, only find the rows where cylinder is equal to six. But then maybe I want to go to the next one and say empty cars eight and see what does it give me. And it says, oh, okay. Well, you probably want the same thing but for eight cylinders. And then let's see what else it gives me. I say empty cars four cylinders and it gives me the four cylinder ones. So in this case, this is probably not necessarily best practice, but it's helped me get multiple repeats or kind of speed up my ability to do some of this repeat code. But if you look at this, this is really 99% the same code over and over. Really the only thing that's changing is I'm changing the rows I want to look at in the name of the data frame. So I might say, okay, well, thanks for giving me that, but convert it to a function, right? So let's take that same thing and let's say empty cars cylinder. And now I started typing that and says, okay, function data frame cylinder. So let's see what it does for that. I'll execute it. I'll do empty cars cylinder. We'll do empty cars and in autocompletes, I probably want to use the sixth cylinder. If I do that and execute it, it gives me just the rows that match the sixth cylinder. So every single one of these rows is six cylinder. So you can imagine that, yeah, sure, it can help you rewrite boilerplate code. But more importantly, it can take existing examples and help convert that to a function. Now, the next thing I do whenever I write functions is maybe I want to write some tests. So I might do something like library test that. I'll load the test that library and this will help me write some of these tests. So I'll say, okay, write a test for that function with test that. And it gives me this output of test underscore that empty car cylinders works. This is the function I have. So let's see what it gives me. It's giving me an expect equal. It's saying the number of rows. So let's look at this. Number of rows of empty car cylinder should be seven. And it actually guessed right. Importantly, it's not doing math here. But you can imagine there's probably lots of examples of empty cars in the training data. So it's actually really good at solving those problems. So let's just execute that. And it gave me a passing test pretty much immediately. Well, this is all working really nicely. And I've kind of chosen simple problems. I hope you're seeing some of the possibility here. Again, like, oh, I started with this script or someone handed me off this script. It's like 30 lines of repeat code. Let's roll that into a function. I also want to make sure because I'm following best practices that I can test it. And let's test that. And even if the number was wrong here, I could go in and edit it. And it's given me useful tests or useful structure of a test. So I might want to write a more complex function as opposed to these little empty cars. So here, let's see, create a my summary function that takes a data frame and call them name as arguments, as well as a logical argument for whether to include any values and a summarizing function as an argument and use the embracing operator to pass the summarizing column. Now, Embracer or the embracing operator, these double brackets here are a concept in tidyverse or are laying of referencing bare column names. So I might do something like my summary, let's start typing that out. And it gives me this. Let's see if this works. So my summary, my summary, empty cars, MPG, and then it completes the rest of it. And it says, oh, well, MPG was not found. So it did not give me a working example in this case, right? We ran into our first problem of it, hey, here's a response, but it's not actually a valid response. So let's try that again. We can delete it. It doesn't matter. I can try again. So my summary and function. Let's see what it says. And it's still prompting me with this data frame column. So it's trying to pass forward that column. Now I've got these examples up above where it's got these things that might be dirtying up the environment. So maybe I don't want this bad code up above. Maybe I want to delete this and try once again. So let's do my, let's first off, let's remove my summary because I don't want that function. Now I don't have the my summary defined and let's also remove empty cars cylinder. Now I don't have any functions to find in my environment. And I can say let's try one more time and then we'll keep going if we can't get it. But my summary equals, ah, now this is looking better. So here we're actually using the embracing operator. So we're going to accept this and it says take a data frame, summarize it. We're using a cross, which I don't think I really need, but it does have the embracing operator allowing me to pass a column in this example. So my summary, empty cars, MPG, and the function we're going to use is mean. And ah, it would probably be useful if I loaded the dplyr package. And let's see what it says. The dot dot dot arguments of a cross is deprecated. So let's just drop across and see what it gives us now. So this is the process of where things can go a little bit awry. OK, so now what I've done is deleted part of the response and said, well, most of this is right, but I actually want you to give me this a different way, like I've tried to prompt it or guide it in a different direction by deleting part of the code and saying, well, most of this is right, but give me this rest of it so I can accept that. So now let's see. I need a closing parentheses there. Let's try running that again. And now it's working, right? So now I got a tidy eval column that gives me back the empty cars data set summarized for the mean of miles per gallon. I can try this with median instead of mean and I get a median instead or a min and I get a min or a max and I get a max. So ultimately, like this is part of the life process, right? Like it's non deterministic. I can't always get it to give me what I want, but it can help guide me down a path and then I can help guide it down the path I wanted to go. So ultimately, we spent a bit more time on that example because it kind of went a little awry, but let's just for fun see if we can test that. So we've already got tests that loaded, but just in case we'll load library test that and test, let's see if it generates a response. It's thinking and now it gives me tests that my summary works. So here, let's see, first off, we'll just run it and says stop if needed. Names for target but not for current. OK, let's just run one example and see what it gives us. List. OK, so the short of the answer is if I look at this, it gave me a useful output, but it's returning it as data frame versus a vector or as a single value. So again, this test is actually pretty good, but we would need to pull the data out. And now let's try that. Pull and one more pull. So now we're getting a vector back instead of a data frame. And now I actually have a passing test that was created very quickly. Overall, it's again, you can see I'm injecting myself a little bit. I'm accepting what it gives me, looking at the errors it's prompting and saying, where can we go? So copilot's pretty fun. It was nice to kind of go through some examples, but I can also say, you know, what is the embracing operator? So maybe I say, what even is the embracing operator in the tidyverse and see what it says? The embracing operator in the tidyverse is used for nonstandard evaluation. It allows you to pass a variable expression as an argument without evaluating it immediately. And then it gives me an example of how I could use this. And if we look back at my function, this looks really close of it allows me to do this filter column equal to value. Now, importantly, this has multiple uses of the embracing operator as opposed to one. And so it's a little bit different than my example that just repeated column twice. But same kind of idea of it explains it, gives me an example and I can work with it. And maybe this is useful or maybe I could ask it like, what is what do these errors mean? Like it's saying mode list equals numeric, what does it actually mean? I could ask it to explain this error. So explain this error and let's ask it and see what it does with that. It's giving me the two parts of what the error actually is. So it tells me, A, that it's failing and B, what modes list equals numeric. So it's saying that the results are different. So it's actually expecting a numeric value, but it's returning a list. So help you guide and interpret some of these errors you might be seeing. So yeah, so this is kind of the different ways you might use it. And again, this was very quick, just kind of going through some examples. But you could see how co-pile is useful for kind of quickly iterating on your code, but then if you hit a problem, you might need to re-engineer it or re-ask the question or the problem you're trying to solve. And then with something like a chat interface, you can use kind of basic syntax and kind of real questions you're asking and get responses back to them. With that, we're at about the 50 minute mark. So I want to pause and see if there's any questions. I see a couple of questions, I think here from Peter and then one question from Lubbock that I'll answer. And if you do have other questions, you can ask them in the Q&A section here on Zoom or in the chat. So there's a question from earlier about what is a effectively scoped prompt? So if I look back here in my slides, so in this slide, there's this idea of saying, hey, how can I provide a scoped and specific prompt? In this case, that's this S2C, simple, specific, scoped and comments. So scoping it or specific, meaning this is what I'm trying to solve right here. So sure, there's this other context of I have a probabilities vector. I have a vector of numbers that I'm going to eventually use. But what I'm trying to do in line nine is return a function to calculate the mode of a vector or create a function called mode. And so this scoped and specific prompt or basically this very specific task to solve was solved in line nine, even though this additional context is helpful. We're just trying to do this right here. So that's where I really like using comments to help break down the problem. So I don't go in a weird direction. There's another question about is it helpful to think of working with co-pilot as a bit like writing pseudocode? Absolutely. And if you look at some of the prompts I gave or the context I provided, I did provide pseudocode. So I said here in line five, using glue, create a URL that matches this string. And because I provided the string, that's what was used in the actual code. It's sped out. So again, I don't have to put this just into my script. I could put into a comment and then that kind of pseudocode along with this description of the problem can be built out into the function I'm writing. It's also helpful for me, like even if I didn't use co-pilot, just writing down and getting it out of my head. So if I pause, go away and come back to solve the problem, I can actually keep working with it. And then there's a question about if I get stuck, how does co-pilot do as a debugger? I hope you saw a couple of ways of like how I did get stuck and I could change a few things, try and restart the task, or in some cases, I had context up above that was really bad. Like I didn't want to write code like this. So maybe once I've converted it over, I can just delete that section. Right. Like I don't have to keep bad code if I want to use it. And so you can also use a chat style interface to actually ask a question of like, explain this error to me. The question in the chat was what training data or type of training data is co-pilot trained on? So that's kind of up to the co-pilot group. So again, co-pilot is a third party integration to our studio. We're not doing anything to actually train it or fine tune it. It's just kind of whatever you're getting back from co-pilot. But it was trained on billions of example of code and some text. So again, the primary purpose of co-pilot is to generate code, but it will actually generate text or comments or other things. And you can even use it inside something like a Cordo document to generate code or text. So I might say today we are talking about co-pilot, co-pilot is and I wait for a second. I wait for another second. And of course, now I'm stressing it out to say to do something. And let's delete that co-pilot is a tool that uses machine learning to help you write code. So it gave me back a lot of text. So this is all one line, but that's why it is taking a little while. This is generating dozens and dozens of words. So this explains what co-pilot is and kind of how it was trained and what the open AI codex is, but then just as easily, I can say, use ggplot to create a scatterplot. So here run that and then let's see what it gives me. It gives me ggplot empty cars and now I have a graphic, but maybe in the same script, I'm like, well, I'm learning about Python too. So I want to use the plot nine library in Python to create a similar scatterplot. So I do something like from plot nine import ggplot. Plot nine is an implementation of graphics in Python that adheres to ggplot standards so I can execute this inside our studio. It will use reticulate to run Python code from R. And then it gives me this similar syntax you might be used to of ggplot. In this case, it's pulling from above. So let's try deleting this real quick and see what it does. Let's do ggplot waiting for response. So it's still generating our code because it saw it from before and I've loaded the ggplot library. So let's just try deleting this, deleting that and keep going forward. There we go. So in this case, we have code that looks like it did before. But this is I have to read in empty cars equals. But this is Python code. So read import and is this pd empty cars equals. All right. So there is an empty. There we go. There we go. And there. See if this works. Horsepower is not. Let's do empty cars. So we're kind of off the rails at this point just looking at some Python code. So let's do cylinder instead of that. So let's execute that, run it. And it's trying to find that is still in DERS. I've got a data set that's not quite empty cars at this point. And now I have a giant graphic in our studio of a Python library actually printing out some Python code for a graphic. So altogether played around that for a little bit and that's an example of using it inside a quarter document and it can do more than just generate the text, but also generate some of the outputs of inline text as well as code. I wonder if there's any other questions. I don't see any other questions in the chat, but happy to answer them. And we've got about five minutes left for today's session. Go back here. While I'm waiting around, I will just call out that if you go to docs.posit.co, there is a RStudio user guide. I'll drop this into the chat. There's the link to the RStudio user guide or you can just search for RStudio ID user guide. So if you do want to learn more about how to use tools like this, you can go to the tool section and you've guide, click on GitHub Copilot. And it talks about the process of using GitHub Copilot in our studio as well as interfaces like Chatter. And it walks through some examples of how to be productive and how to use it. And a lot of different things about what you can do with Copilot in our studio. Can these tools be used in Posit Cloud? So they can in the future, again, because GitHub Copilot is available in preview, we're expecting that it will be in general availability in December when we do another release. And so at that point, we'll probably enable it in Posit Cloud as well, because it'll be globally enabled for users. They'll still have to install things like the use their GitHub credentials and have a Copilot subscription. But generally, it's not available right now in our studio cloud, Posit Cloud. And I'm speculating here, Thomas, but in some ways I'm imagining Microsoft and OpenAI bought GitHub for the purpose of obtaining all those fabulous repos. So in a way, we may have all contributed to training. Yeah, there's, I believe, a large amount of the corpus of text that was used to train it came from GitHub. OpenAI is a partner to Microsoft, so they're very tightly kind of intertwined in how they work together. And so the GitHub Copilot is actually powered by OpenAI's Codex model and some of the other Azure based large language models from Microsoft are also powered from OpenAI. Well, Thomas, thank you very much. Yeah, this has been awesome. I think we've all learned a ton. And even for the folks who couldn't necessarily make it today, I think we'll learn a lot from the recording. So we will share that. Well, thanks again so much for having me. And go have fun with it. And if you run into any problems, please open up a GitHub issue. We'll do. Thanks a lot. Bye, folks.