 You don't like Drake this sort of function oriented paradigm where you do most of your setup to begin with and test those individual functions and modularly before you put them into the top, the top level part of the part of the workflow and to have that top level function or script to be very small. I think it's a useful strategy for large projects from dissertations to to actionable industry data analysis workflows. But for Drake specifically, it's there's not that much extra work to do to take this and to put it into Drake, and there is a lot of payoff. If you do that, and I'm going to walk through that. So the final step for taking something like this and putting it into Drake is to define something called a plan. And usually I'll have this plan our script to define that. So what is a plan. The plan is just an outline of the steps of your workflow. We outline our workflow in steps so we can skip them later. So Drake saves time by skipping computations that it doesn't need to run. And I'll walk through how that works. In order to do that, we have to define what those steps actually are so we can skip them. So we have a step to read in our data, we have a step to pre process it. And that's all we're going to start with now because we're going to start small here. And so that's the code that we write to define our plan. Our actual plan is just a data frame. It's a data frame of the steps that we're going to do. So by creating the plan, we didn't actually run anything just yet. This is still set up. And the setup is is absolutely worthwhile. What we have here is a bunch of our connects. So these are just arbitrary arbitrary pieces of code. And what is going to happen is these are going to run and the data objects are going to be labeled with these informative names that we defined previously. And each of these, each of these steps is called a target and the target names here are on the left. When we actually, when we go to run these setup, these setup steps, Drake can actually analyze your workflow and tell you what's going on. So if we load our packages functions and plan, you can write in this vis Drake graph function to to ask Drake to tell us what are the moving parts in a workflow and how does it all fit together. And so we define this churn data and churn recipe as as targets and churn data comes before churn recipe. It needs to be run before the data needs to be run before the recipe is wrong. And the data depends on this data file and the split data function is churn recipe. I know the arrows are overlapping, but it depends on this prepare recipe function here. Now Drake automatically notices this because it, it looks at this plan, and it runs something called static code analysis on these commands. So it analyzes the symbols in these commands without actually running the code. And it uses this to understand the code before it actually runs it. So it notices that in this churn recipe command, there's the symbol churn data. And so it knows that the churn data target is a dependency of churn recipe, which is why in the graph it puts that it draws an arrow from churn data to churn, to churn recipe. And likewise with this prepare recipe function. So you can, you can actually define the targets in this plan and any order you want. And it doesn't matter to Drake. It's, it allows you to think about the, the individual targets, the individual components of your research and then Drake takes it and fits it all together automatically. It unders, leave it to Drake to understand how, how your entire workflow should proceed. And so when you actually go to run the project with the make function, what Drake is going to do is it's going to run the correct targets in the correct order. So data before the recipe. It's going to, and it's going to store those values in a cache on on disk for for later use. And because those targets are stored on disk, you can restart your session and still, and still your work is still going to be saved. And you can, you can read those, you can get those data objects back. So the, the read and load functions are the ways to, to get targets back from from this cache. And so what it's really doing is reading these data files. You're, you don't need to worry about file names at all. All you need is the targeting. And so what one of the great things that Drake does is it abstracts the files as targets, and it makes it so that you don't have to worry about micromanaging files you don't need to decide where these data files go or what they should be called it Drake Drake's abstraction takes care of that for you. It's extremely helpful for organizing projects. At this point, we've, we have defined our targets are initial targets formally in a plan. And we've, we've run them. We've done a little exploratory data analysis. This is our exploratory data analysis phase of the project with loaded read. And after that, after we've decided okay, our current set of targets is pretty good. We're going to add a couple more targets. We're going to build up this plan incremental. And then we're going to repeat, we're going to repeat this process, we're going to repeat the workflow with the make function. But we're not actually repeating yourself, you know, people, people talk about dry work full as DRY stand for don't repeat yourself. When you call the make function repeatedly, you're not actually repeating all of your work necessarily. So you can afford to do it pretty often pretty frequently. That's, that's what allows you to build up this plan gradually think about the pieces. So we're just going to add some deep neural nets some some models using different hyper parameters. And we're going to use the functions that we've defined. And we're going to see that everything prior is up to date because we didn't change the underlying code or data yet. And the outdated pieces are models. And it's good practice to always look at the graph at times like this and you look at the graph. We see that okay, our models are properly connected to our functions and our data, our previous targets, the data targets are up to date the model that we need to rerun. And so that's exactly what the make function does, and it skips everything else. And we can likewise read and inspect our targets instead of saving the actual model objects we saved one row data frames to sort of to to make this this whole process is tidy and light and storage as possible. We can go forward one row data frames of the accuracy and the model hyper parameters that we expect to do. And we can go forward and add more targets so we're going to we're going to go through our previous model runs, take the one with the best accuracy and retrain that model return as an object so maybe we can more easily deploy it into production for a company, or we can, we can go through further tests through further prediction tasks. And again, it skips the previous work because all our previous targets are up to date. We can inspect our model. So because we said format equal to Keras in this optional target function, Drake knows to store this as a chaos model, which isn't always isn't always possible needs to be serialized in a special way but that's possible within Drake specifically that's a feature. And likewise we can go forward when we add some more models, something interesting begins to happen. So not only is the new model that we added out of date. So this new model this one softmax are downstream targets are also invaliding. So if you look at the graph previously we ran best run and best model, but because we have this new model introduced those targets are no longer valid. And they need to run to reflect the latest results from all models, Drake automatically detects this. And so not only runs the model that runs it picks the best model run all over again. And if that best run turns out to be the same model. If this targets return value is the same. It doesn't bother to retrain that best model. This may or may not happen depending on the results of the model, but Drake can make that decision based on the content of the data itself. Instead of something else like timestamps. If you change a function, likewise Drake notices the change and reacts to it so we change the dropout rate rate of this layer. We go to the graph, we see that okay the function that we changed is defined model which, which affects train model which affects test model. And so all these targets downstream are still are invaliding because they need to run again to reflect the latest model definition. So Drake automatically knows and understands how these functions can be nested because of that static code analysis. And so we run the make function all the models and the necessary downstream results rerun. But suppose that was a sort of a temporary experiment and you want to go back to your previous work, you go back to this function you change the dropout rate back to point one. Well, if you if you want to revert your work without wasting a lot of time, you can set recovery equal to true in the make function. And as long as these targets were built with the same versions of the dependencies and the command and the random number generator seed and other other dependencies. Drake will automatically recover these targets without actually rerunning everything and this has saved a whole lot of time and I've been I've been under time pressure sometimes at work to answer really quick one off questions that just required quick changes to the code. I go in and I change a function. I answer that question and I revert back to me the main thread of the project. This has been this has been helpful to explore tangents quickly without getting into a whole lot of trouble. And if the data file changes because of this file in keyword that we use in one of the commands Drake can also notice changes in data files and automatically react. But at the end of the day, if you haven't changed anything, Drake will just check your work, tell you that all your targets are up to date and do nothing else. And this is super important for reproducibility. It's tangible evidence that your results match the code and data they came from. It's it's evidence that the results are synchronized with what you're sharing with your collaborators in terms of the code and data, and it increases our ability to trust the conclusions of the project. And in this approach to pipeline tools where in this approach to workflow management, where we're running parts of the project and updating parts of the project instead of the entire thing. This is extremely helpful. Drake also tracks history of past model runs with the Drake history function. So you can see past versions of targets you've created, along with the some of the name function arguments in the function calls and those commands. And you could recover old data, if you like, without without doing the whole data recovery thing if you don't want to. As long as you didn't garbage collect the cash, this is possible. So there are there are a bunch more features that Drake supports like high performance computing on clusters. That's another whole huge topic that that my collaborator, my collaborators and I use quite frequently. It's super useful, but it's not that much of a stretch. So it's, it's not something I usually go over in workshops like this, but the materials and documentation there's a whole chapter in the manual and the online, the online manual that talks about this. Down here. And it goes over different kinds of high performance computing options, different algorithms and options, and things that you can take away and apply right away. They're efficient data formats like the Keras model format and there's their formats to support storage and data tables, etc. And a bunch more topics that you can get into. Drake is on Cran, you can download it examples. And this workshop is open source on the skithub page available for download and exploration if you, if you want to run it locally. And here are a bunch more links to resources. The slides themselves with this link are at this URL up top. Also linked from the learn Drake short course, which are in right now. If you have a use case that you're particularly excited about or willing to share our open size collecting use cases of its packages. And so they would they would love to hear from you. Our Drake is an our open site package was on board and peer reviewed by the amazing folks at our open side. They've done a whole lot to to spread the word about Drake and its other packages and even a fantastic community to to to share ideas and and provide support and energize and catalyze the discourse. Thanks also to Edgar Ruiz and Matt Densho for the, for the examples so this deep learning case study was based on a blog post by Matt is on the R studio blog and it goes into more depth about the, the, the methodology and the story behind the use case. And thanks so much to Edgar for putting together Matt's blog post with the Drake package this it made it made a huge difference in this workshop. And lastly, thank you to everybody who has contributed issues and pull requests and and asked for things and identified bugs here. It's been it's the development, the developer and user community has been extremely helpful and I've learned a ton from everyone. I recognize some of the folks in the, in the list of participants who we've talked to actually fairly recently on the Drake issue tracker about about some of these things so big thanks to those of you who are listening in here. And with that, I am ending the presentation portion of this short course. The rest of the, the rest of the, of the tutorial is going to be more hands on interactive. So you're going to be working with our notebooks and shiny apps and go with your exercises. And I'm going to be here to answer questions and provide help one on one. You can either just ask your questions. You can unmute yourself to ask verbally, or you can write in the chat and I'll see. And if you were to take the, the tutorial, the workshop yourself after we're done here you can do that. You can sign up for an art studio cloud account. You can log into this, this workspace link which is also linked from the tutorials development page. And you can work through the notebooks in order the notebooks will direct you to various shiny apps. And so this this workshop is already available, but you've got me here and I'm going to help you one on one. And the exercises take you through this example in the slides, go through the building blocks and build up the workflow gradually and explore the features of Drake as as code and data change. And with that I'd actually like to encourage you to go to a different set of infrastructure that we've set up specifically for this purpose because we expected a lot of participants. This infrastructure. This is backed by our studio and it allows it will allow for greater number of participants at higher performance. So I'm going to if I can find the chat. I will try to enter it here. There's some reason when I shared my screen, it's took away the took away the chat window so I'm going to stop sharing right now. And I'm going to paste these instructions. So the steps to getting started with this workshop are in the chat right now. And I will share my screen again to walk you through them right here. So, if everybody could follow along and set this up. And if you could have something to write with that would also that will also help you. So, because there are credentials that that are worth keeping track of here. So, I'm going to ask you to do is log into. It's not pasting correctly, but yeah, if you would all go to this URL. I know there's redirects. But it's just our studio slash class, enter q3 underscore learn Drake. This is the workshop identifier. And if you click submit, it'll ask you for your full name and email, and then submit that it'll give you these credentials. If you would all copy them and write them down, they're going to be different for each participant. So, you're going to have your own instance of our studio cloud to work with. You copy down those credentials. Then, what you can do is you can head over to this link here to your workspace. It may take a while. So, while it's loading, I'll just say that Oh, it's, it's up. So if you select our studio server pro and enter your credentials, learn Drake user and then Drake pass. If you log in. You'll need to click a button to start a new session. And then you should be in a workspace in our studio cloud workspace with the all the notebooks for the tutorial pre populated. I'm going to the first notebook, which is called one dash functions that are in deep. And once you'll get in here, I know I'm showing a local copy, you'll, you'll see a file system like this, going to one dash functions. Click one dash functions dot our project. I'm already in this project, which is why it's just reading the options. And, oh, have I not been sharing this whole time. Just ask me to share. Anyway, if, if the goals to log into the workspace and to open this first notebook, that's where we're going to start. And once you get here, let's see, once you get to the, the, the notebook with that goes through the functions. That's where we'll start the, the hands on portion. Let's see. So, well that will, well that's getting set up for everyone. I'd like to, to take a moment to answer some of your questions. I'll have a little Q&A session, and then we'll go into the notebooks. If there are any questions, anything that I missed anything that didn't go through, or just questions about Drake suggestions about Drake, things of that nature. And again, you can either put this in, in the chat or, or just unmute yourself and, and ask. They see there's also a raised hand feature. Okay, so the question is the plan built up manually. Yes. So the, for the most part, at least, at least to start all the plan, all the Drake plan is, is constructed manually. So, you'll write individual targets individual commands in that Drake plan function to build up that plan data frame. Later on, we're, we're going to go over over some ways to shortcut this process so there is, there is what's called static branching. And there's a, there's a chapter in the manual about this. There's a chapter in the manual on static branching, which when we'll get to some exercises on that. We soon, or later on actually. So this is this chapter, chapter five of the, of the, of the manual describes all this ways to ways to shorten down a big complex plan like this. So you don't have to write every single, every single target out manually. There's also a dynamic version of this dynamic branching which we'll hopefully also get to. Okay, another question is it possible to combine Drake with book down. So the way I generally think about literate programming and artifacts like this is that is, is that maybe it's the and hopefully get to this in the last chapter of the workshop. So it's, it's, it's certainly possible. The way I think of that to, to combine these two different toolkits best is to, is to have Drake do the heavy lifting so long computation orchestration of targets, and then you have a bunch of data objects as as And then you'll have a target to, to, to, to draw from those targets in the book down book itself, and to, and to render it, and then a target the very end to then deploy it as a, as a website or or just or publish the document in another, in another way, whether it's, you know, on online to GitHub or in our studio connects artifact. We'll get to some examples of smaller, smaller literate programming documents, hopefully. So there's, there's a way that Drake automatically interacts with, with report with our markdown reports in general. And hopefully that can be useful. Are there any other questions. Are there any questions about getting started with the workspace and the notebooks. Okay, with that, I would say, if you could log into your workspace, open up the, the first notebook on the functions, we're going to spend about 10 to 15 minutes right here. And this notebook is just to dive deeper into the specific functions that I mentioned briefly in the presentation so it's going to go over these seven different functions with our data sets and analyses and summaries, and some functions and some other functions because we're breaking things down into pieces, but you'll get you'll get a sense of how this workflow is organized and what what the code base is actually doing. These are the building blocks of of the work. Oh, sorry. I, some reason zoom is a bit difficult to do is I'm on a different machine than I usually do. So the screen sharing is a little different. So I should be sharing now. And the, so we're going to go into this into this notebook. And we're going to go, we're going to give you an opportunity to, to work through the functions work through the examples. And there aren't any actual exercises other than just running the code chunks. So you're going to start by, by running these, these code chunks, maybe playing around with the functions, and then the actual exercises that that I'm going to ask you to do right little bits of code answer questions here and there those are going to be later. So, I will check in again in 15 minutes, or 10, or I'll check back in 10 minutes. And if you, if you are. We know if you, if you complete early in, if you, you have questions, it helps me sort of gauge the timing every audience is different. Feel free to go at your own pace if you want to, but I'll try to keep sort of the default pace for the whole group. So we can, so I can explain things in Q&A will make make a bit more sense. So questions are coming up about the, the, the CUDA and TensorFlow messages that you get when, when fitting models. And those, most of those, in fact, all those you can probably you can, you can ignore safely. There are messages about compilation on different architectures that are possible and efficiencies that can come from that. As long as the, the funnels, you can, you can ignore those and be fine. If you're, if you're seriously using TensorFlow on other machines, you're probably going to want to use the GPU version and compile it to run fast, but here is just pedagogical for the purposes of teaching Drake. So as long as it's returning results, then we'll be safe. We'll be fine. So how's everybody doing on that first notebook, do we need more time, do we need less time? We're working on this for about 12 or so minutes. So, be happy to answer more questions or go right to the next phase or spend a little more time here in the middle of the notebook, or if some people finish, all done, one person's all done. Yeah, thanks for responding in the chat with that. It helps. And like I said, every, every group is different. Some people speed through this right away. Others need a little bit more time. I figured that those of us who, who, who make the effort to go out of our way to attend a workshop at our conference can probably find this a little bit more familiar. But I don't know. So you've got one person who's almost done. So maybe another, another minute or so. Any, any questions on, on this, on this notebook. So going forward, we don't really need to, to understand every little detail about the functions. What I'm mostly looking to convey here is just the general layout of the way we break down the problem. And you might want to take away for, for your own work, some of the stuff about recipes and tidy models that, that, that it's here could generalize beyond just, just deep learning. But, but yeah, just, just an understanding of the functions that we have and the functions that we're going to use in the Drake plans that we built up is going to, is going to support the, the other notebooks that we go forth and, and do. Great. So I'm seeing that most people who responded are done. And so let's, I believe I'm still sharing. So let's move on to the next notebook. So this, you're going to go into this to dash plans folder. The solutions are next to it if you, if you want to look after we're done, but go to to dash plans, and there's a different our project that we're going to open. So I'd click on this and confirm it, that'll restart the session. And if you open up to dash plans dot rmd, that's going to have the next exercises and here's where it's going to start being interactive because they're going to be placed. Before you'll need to insert code for the plan to work. So, first we'll start out as usual confirm your in the correct working directory, load all the the options, including the Drake package, and then go forth and complete the exercises and the exercises focus on building up the plan gradually, adding a couple targets running what we have so far, inspecting the results, adding some more targets this gradual approach that you can you can do to make things to break things down and make them more make them easier to manage, because we're not running everything from scratch the whole the entire time. And that allows us to do this. So this is exactly what I spent most of the slides doing. So we start with this with this plan here, and then to add more targets. They're going to be, they're going to be places like here where you're going to need to insert some R code to define the new targets, make sure the dependency relationships are correct, and then run the new targets that that we have. And you'll see these these your turn markers to where we need to insert that code. And we're going to spend, we're going to spend about I expected with with other groups that this can take about 10 to 20 minutes or so. So just let me know where, where, where everyone is where you are in the notebook. I really appreciate those of those of you who chimed in and said, yes, I'm done with with the first notebook that really helped. That really helped time things if you let me know about the second notebook that would that would help move things along and make what I'm saying relevant to you guys. So this is where this is the first part of the workshop we were actually coding and drink so work through this notebook and let me know if you have questions along the way. There's there's plenty of room just to ask to ask questions and things like that so I'm seeing somebody saying is still loading does it means it hangs or you have to restart restarting isn't a bad idea. It's it's it's not a it's not a bad ideas and saying feel free to feel free to go and restart if that helps you can either you can either restart your browser, or you can log out and lock that log back in if you if you save your credentials or just start a new workspace entirely. The folks at our studio have done a great job to set up our infrastructure, make sure it scales to them or people and it's reasonably high performance. So, yeah, it should work if you restart again. So a great question just came up. So any specific reason for assignment with the equal sign as opposed to the assignment or as equals or arrow interchangeably the code. And Drake plan itself you're really defining your defining your defining objects within within a different inside a domain specific language that defines the plan data structure. That's so equal signs are important there, because it's it's sort of its own Drake plan is really its own language on the inside. Everywhere else it just it doesn't really matter, but but it's good practice to use, use the the arrow just because that's, if you want to follow a code style that it's sort of more universal and that's easier for others to read the arrow is more accepted So I guess in in plans inside the Drake plan function it's kind of it's more like the the code style is it looks like you're defining your arguments to a function rather than assigning objects that's it's kind of because you're you're working inside a call to the Drake plan functions just it's that's more of a standard way to write it and insert outside of function call the arrow is is stylistically more widely used. Great question so I just got a question in the chat asking to explain the read and load functions a bit more. And these functions are designed to to get existing targets from the from the data store from the cache. And so if we go ahead and run the first plan that we have, I can show you believe I'm still sharing my screen. Oh, are you just getting set up here. So if we have if we have targets in in the cache, then read is going to retrieve one of these data objects from storage. So we read turn data that's going to return it as a value so we can assign it to variable and then reference that variable. And so that's that's just how read works load just it's it's similar but it assigns the object to to the target name in memory so right now turn data. So turn data is not it's it's it's it's not a variable yet we don't have the value assigned to it. But if we say load turn data. Then we have this object in memory here. And if we call load with no arguments it's just going to load all the targets in the cache. So we not only have turned data, we also have turned recipe. As well as any other targets that may have existed in storage. So this the the the reason why we have these functions is because we have these targets, these targets get put in the in in the data store in cash, and we want convenient ways to retrieve them back from from storage because they aren't just simple files. So Drake uses a package called store spelled S T O R R to to to store the the return values of the targets. And so you'll notice if we if we list the hidden files there's this file called drake. That's actually a folder that contains all of the all the data from our targets. We look at what's inside here. There's there's actually a lot more to it than just a folder with target names there's. In fact, there's it has a whole bunch of critically named hashes here so in order to actually load the data and we have to go through the store package itself or these these can be functions to get them from. Also thanks for let me know if that doesn't answer your question. I can go into more detail if you'd like or backtrack and explain other things. Thanks for. Thanks for letting me know you're done and glad the recovery thing is is is helpful. The data recovery partially makes up for the fact that this format for data storage isn't really that friendly to version control so if you're if you're familiar with get and get hub. What you want to do for project is at least for the code you want to put the code under version control and upload it to get hub or or get lab or another version control system so that you. So that so that your project is safe somewhere that you trust and you can go back to previous previous versions. This format with all with these critically named files and a lot of data is is not very friendly to version control because it's just. It's their large files and things get messy pretty easily so data recovery and Drake is one way to make up for that if you're if your project is is local. You don't necessarily need a version control system like get to to go back to previous iteration of the project. So it's a it's a work around and it's a work around that that that helps. So we've been working on this notebook for a while just like to check in and see if see where people check on check on people's progress I see a bunch of some some people are done maybe not everyone. So if you're comfortable sharing in the chat where you are if you're not done, or just message me privately that you need more time feels free feel free to do so. I see most people are done. We're going to give a couple more minutes for the people who are finishing up. You got a question is Drake, leading to IBM SPSS model or fashion. So Drake is is our focused. It's been certainly other languages like SPSS are possible to integrate with Drake if if you track the code that the scripts like an SPSS script as a as a data file. Certainly, some of my earlier collaborators have done that with with SQL files for working with databases. And that seemed to work reasonably well. Does that does that answer your question. Let me know if it feel like me to if I if I didn't answer that well. Another great question is there is there any suggested sync system for Drake for Drake files on the cash. That's an excellent question. It's actually something that's fairly difficult and Drake because of how because of the default cash system. It's pretty heavy in storage, especially, especially because of because it doesn't actually delete data very often. I mean, it avoids duplicating data. But if you're unless you garbage collect the cash and there are ways to do that, that it leaves a lot behind. So data recovery is the silver lining in this it's sort of making making lemonade out of lemons as far as that. So the storage situation goes but that means that it's because of the size and storage it means it's hard to transport caches from one system to another. Drop box and box in one drive are pretty good for those because they, I mean, Drake has a lot of tiny tiny files so it'll take a little bit of time to sync, but it does it does get the job done. I think I've, I haven't, I think that it's because of the the large number of files and the in the large size of those files I think those platforms are pretty well suited to that. No Ross had an example where he, he has a nice, nice example for for packaging up and archiving entire Drake caches. And I'm going to try to find it here in chat and posted in the chat. Here's an example with Drake and Docker and get lab. So if the cash is small enough you can actually can actually zip it up and archive it and even even tell the the continuous integration service that you're running on if you're using that to to save it as an artifact. I just copied a couple links in the chat to for his example of that a lot of times in in in situations at my own work. Cash sharing we collaborate on projects we we we work a lot on modeling and simulating modeling and simulating clinical trials we do we do that a lot for my for my work and it's with a lot of data and tight timelines and we don't really have the timer space to ship large caches around, but there are oftentimes just a few final artifacts that we want to share a few final artifacts that end up being important. And so, and so what we can do is reproducibly export a file from from a Drake workflow to have a target at the end of that writes a file and tracks it. And that that piece is often just a data set with all the required information used that we need to come up with some post processing summaries and that's, and that we can share a bit more easily. So, sometimes it's best to just kind of do a do a hybrid approach. Another question do you have an opinion about if I should or should not get ignore the cash I recommend in almost all cases that you ignore that cash just because it the size blows up pretty quickly. And so does the number of files. Unless you once you commit that cash unless you rebase the commits your kind of it's it's almost like a curse, because every time you clone that project it's cloning the entire repository because because get is is decentralized by design. And so once you once you have that data committed. It's hard to remove that commit history to remove that large data. Plus that the commit history for those those tiny files is a bit harder to look at so for Drake specifically I would recommend that you ignore the cash you get ignore the cash. Unless you have a good reason to, to do to not ignore in your specific use case, and there should be a get ignore file inside the drake folder already. And then accidentally committing that that whole, that whole cash. These are great questions, thank you for speaking up. And so last time I checked I think that most people were almost done with the notebook. So, I think this is a good time to move on to our next phase. One more question though, before we before we move, move on. So if you fit multiple Bayesian models, and you want to be able to get back posterior from several different ones do you recommend saving manual RDS files or model fits the functions are loading them up later to avoid clogging the cash. Okay, so great question this is very I'm going to answer that before we move on because this is this is really close to my wheelhouse we in my work we fit lots of Bayesian models. We fit them over and over again so not only do we have 10s of 1000s of posterior samples from Bayesian model we have, we have 1000s of model fits. And so we need to think about how to manage the output of that. Effectively, you can, you have a couple of choices here if you think you can store that data. First of all, I would I would avoid committing posterior samples to to get and get hub to matter, no matter where you have you have those that's just that's a lot of data. And so what I would do first of all, if you end up committing. If you end up storing posterior samples to the data store, I would read to the cash as targets, as return values from targets, I would think very carefully about that the format that you store those because if you store them as RDS files because of the G zip compression that goes on, it can, it can take a long time if you don't if you don't compress them. I think to Drake definitely does try to compress those those data sets. But you're if you do store posterior samples you're almost always. You're going to almost always want to choose a specialized data format, and try to find where, okay special data formats for for targets. If you're storing a lot of posterior samples or a lot of data, you're going to want to define you're going to want to wrap that command in the target function, and you're going to want to choose a format like FST, or better better yet FST TBL. And what this assumes is that you have a data frame of the, you have a data frame returned by your target. And because it's a data frame you can use the FS Drake uses. If you select format equals FST Drake uses the FST package to store the data in a very efficient compressed format, and not only is it dramatically smaller in storage takes far less time so for one and a half gigs of data. It's, it's over 10 times faster to store. And this is extremely helpful if you really do need the posterior samples of each model. You can return a CODA MCMC list you can also, you can no longer use the FST format you'll have to use the QS format instead. And I don't have a sense a good enough sense of how efficient FST versus QS is. But either either in storage or in speed, but both are options to try. One of the things I can give you in Bayesian context is to avoid storing posterior samples. If you don't actually need them so I would get your, your model output. But if you don't really need those posterior samples, I would instead store the store summary statistics like quantiles of posterior distributions like you might want 50% or 95% credible intervals or you might want posterior needs potential scale reduction factors, maybe you want, I would definitely throw in effective sample size as well to make sure to as a convergence diagnostic, but things like that, and maybe just one row per target with that summary level information is great for Bayesian data analysis to avoid clogging the cache like we said. Okay, I bet I went on a bit longer of a tangent that I meant to on that let's get to the next part of the, of the workshop and this is where this is where Drake, the usefulness of Drake really becomes apparent. But now we've built up the workflow, right? So we're going to, we're going to go back to this project and we're going to explore what happens when you make changes when you, when you change a model when you change some code when you change some data, what happens if you, what does Drake do to react to that situation? And so what I'm going to have everyone do is go to the three dash changes folder and open this our project, the three dash changes project, and may take a little bit of time to switch, but you're going to be in the correct working directory with all of your materials set up for you. And you're going to be doing some some custom coding. You have some, so you have the, the functions and packages and plan already set up in this r folder, and you have scratch work. So you have this, this r script and this notebook, whatever you prefer preferred to write in, in notebooks, you can open this one if you prefer to, to write in scripts, you can use this script. Either way, I would go to, I would open this up open this project and go to this three dash and go to this shiny app that's that I'm posting in the chat right now. And so this will have some guided exercise is free to follow along, follow along. So to our studio is thankfully back to the deployment of these of these shiny apps. And this one is a tutorial. So you would what I what I'll ask you to do is keep this workspace in a cloud open, but go through these exercises, these, these questions, and it'll tell you, okay, so source the options and source the scripts, you just follow the directions. And it'll ask you to make the, the questions will ask you to make changes. And then you'll have to think about sort of, okay, well, what did Drake do, and why did, why did Drake behave the way it did. And, and hopefully you'll get an understanding of what is what Drake is doing when you change things in your workflow, why does what it does, and how it saves you, how it saves you time and how you can use it to save time effectively when you're when you're updating your workflow. And this one usually takes a bit longer there, there are quite a number of questions, and some things may be counterintuitive at first. So, we'll be here to ask questions, I'll check in where everybody is in about 15 minutes, but what I'm expecting this to take a bit longer than the than the others. So how's everyone doing with the exercises. Anyone halfway or all the way done, or thanks for the responses. It's like somewhere between half to two thirds of the way, mostly a half, maybe a little less. That's good. That's good. I'm expecting this to take a lot of time this is a really important part of the of the of the exercises, and really gets you to experience how Drake works. So I'll check back in with just a little chat about progress and maybe five or 10 more minutes. In the meantime, don't hesitate to keep reaching out. So we got a great question about the message unloading one target or unloading two targets. But when when you run the make function, Drake assumes that all your targets have unique names and all the objects in your environment have unique names. And so, because of its assumptions, it doesn't allow target names to share the names with with variables you define in your session so it automatically unloads them. And to avoid surprises it just tells you what's going on. So in the world of pipeline toolkits and in functional programming in general there's a concept called immutability which is just a fancy term for objects are created once and then they are not modified and so this this assumes that and this is this is part of that assumption where because every target kit is is created once and not modified its its name is unique and it's it can't be over overwritten. To avoid having to overwrite an object in what it creates that target it just it just makes sure that no other target shares that name or no other object shares that name when it when it starts. The question about debugging when targets are unexpectedly outdated. So that's a that's a great one. And I agree it does it does get tricky because there's Yeah, it's it can be it can be tough to track that down sometimes I would. The first step I would do is to visualize the graph which which you've already been doing with this great graph. Sometimes that'll give you an indication of, of why of why targets are out of date. There's also this function called depths profile. And I run that. Let's see if I run this pipeline for it for a second and then demonstrate that it might become clear. So, so there are ways to make a sense overall of what might have changed. As far as, as far as dependencies are concerned. Okay, so now that we've run the pipeline, we can say okay what's the depths profile. We have to supply the target name best run and then our plan. And it'll tell you at a high level if I mean why why a target may be outdated it'll tell you if the command has changed in some way or if a dependency has changed or input or output file has changed or you selected a custom random number generator seed. Almost all the time it's going to be this depend. It's going to be this depend trigger that is that is activated because you may have changed the function or a, or a code base or something else in your code base. There's also this, let's see if I. So, if you're, if you're kind of wondering, okay, well a dependency change could be a function it could be another target, the depths target function can sometimes help with this and so it will, it'll list the, the dependencies of this target from the information and it'll show you, it'll show you the hashes of each of those it's not going to show you the previous hash because Drake doesn't keep track of those accepted in history, but this, if you if you run it once you observe one hash or another time, observe another hash for dependency then, then that might, it might give you some indication of what, what makes a target outdated, but other, other than that, those are, so those are the tools that Drake's provide to Drake provides specifically to help you with that. Otherwise, it's, it's generally a hard problem to backtrack that and I've been, I've been kind of working on that but, but it's, it's, it's sometimes that sometimes it helps sometimes that's not sometimes it's, it's not always as satisfying as you'd like it to be. Okay, a bunch of people are chiming in that they're done. Thank you for that for letting, for letting us know. So, anyone else need more time, anyone else done with the exercises. Great, so I'm going to, I'm going to, if there, if there are no more questions and please feel free to chime in in the chat again with, with more questions. Next, I'm going to just address a couple of the, a couple of the, the previous questions, which I think were super important about okay well what happens if a function is unexpectedly outdated or, or, or why is it through unloading targets for you'll notice from the exercises one of the things the questions tries to make clear is that, is it the functions that Drake tracks are the functions in your session they're currently loaded into memory, and not necessarily the functions that are in the scripts. So, in your, in your functions.r. So when I first created Drake, I thought that okay we need a pipeline tool that is not totally committed irrevocably to, to configuration files and script files and things like that we want something that's not explicitly file based. But there are certain, there are certain ways that I've learned where that's it's made that difficult. So, the fact that functions can become unexpectedly outdated because your global environment may be really busy or you may have a stale session. You may have made a change in your global environment and forgot about forgot about other changes that you may have made. But what I highly recommend you do in practical projects is if you're using the traditional setup that I've been, I've been showing is if you restart your R session, every time you want to run make the make function so I recommend restarting your session, and then, and then sourcing your your functions all over again and your plan all over again, and then calling the make function. And that's, that's to ensure that you'll just have a fresh clean session that when your targets are up to date, you can trust that that you're, that's, that's due to the fact that you have an updated set of, of underlying persistence scripts, and the when you leave this project alone and then come back to it, then your, your targets should still be up to date. So restarting your sessions is, is, is important for for serious projects. And there is there's actually what this leads to is is a different set of functionality that I developed on top of what Drake already has. I originally had that, that automatically spins up a new background process to do this for you sourcing all the scripts and then running the make function. So to make to, to, to use Drake in a way that's more reproducible that creates a fresh new clean session every time. And there's, there's this chapter in the manual chapter, chapter seven goes over this, there's this safer interactivity session section, and describe some of the problems with unexpected functioning validation. The adjustment is pretty much just you define this, this configuration file call called underscore Drake dot R and inside this underscore Drake dot R. It's basically like your, your top level run everything scripts, where you define the fight if I, if I create this new script. What I do is I define an underscore Drake dot R with that sources everything, and then calls this instead of make you're going to call this function called Drake config. And it's this, it's, it's this internal pre processing function but you use it here, instead of the make function, just because this this configuration file supports multiple different kinds of functions and not just the make function. So Drake config takes all the same arguments that the make function takes so usages is exactly the same. And once this file is here, you can call our underscore name. And what this does is it defines, it spins up, like I said, a background process to get different kinds of console output, and to run everything from that fresh session. So if I run it again, it'll reload those scripts, those, it'll reload those packages again because it started from from a fresh session. And not only do you have our, you have the same, you have the same function family for for variations on that same thing. So you have, you have versions of that for our for outdated and visit. And this great graph, and other other functions that require the plan that required some degree of of pre processing on those on those functions in your environment. And, and see, using this family of functions you get a you get a consistent sort of set of behavior rub revolves around this configuration file and this persistence reproducible session management. So I highly recommend that for, for practical projects. I don't, I don't usually teach it right away because usually simpler to get started with the make function. But for serious projects, I would define that underscore great dot R this way and rely on the interface that supports that that revolves around that because of what you've learned in the in the exercises. So that's one of the questions to follow up either bad or the exercises or anything else. All right, just let me know if you have them. We are going to move on to the next section of the tutorial. So, as you may have guessed, we're going to go into for dash static. And there's really not much here except a set of scratch workspaces. Most of, most of this is going to be in an interactive app. So if you would go to the link in the chat to get started that would that that is where the exercises live. There's also a helper app called Drake planner. It's going to make these exercises a whole lot easier. And I just copied that into the chat as well. So you can take a look, I would open up both of these links right now and let those apps initialize. So you probably don't even need your cloud workspace for this but please keep it open because we're going to come back to it. And you may still want to use it. You may still prefer it. So this addresses the question early on, which is great question for this workshop. Okay, do you do you need to write the whole plan manually. Do you need to type in every single target. And up until this point the answer has been yes. But what if you have hundreds of targets or thousands of targets and you push. I don't necessarily recommend all the time because then Drake starts to starts to lag a little bit because of the overhead. So you can, if you can condense things down to smaller numbers of targets that's maybe a couple hundred that's that's usually pretty good. But even then it's, it's, it can be hard to write out everything explicitly. But by by hand, they can get quite cumbersome. And so there's this shorthand that the Drake supports called static branching. And there's a whole chapter on static branching in the manual. And if you think you're going to use this for your projects, I highly recommend it. It's just chapter five in this link. Also put this in the chat. And this, this set of exercises is hopefully more gentle hands on introduction to all that for our deep learning use case it's a static branching is a shorthand that allows you to accomplish to define more targets with with with a lot less typing. And so what I do, if I was to go to this, this app and and proceed to the exercises so it's the exercises are going to be of this form so you're going to go into. You're going to get this this plan that you're that you're tasked to create and it's graph of what it's supposed to look like. And you go down here and you're expected to fill in fill in the blank to to construct this plan. And it'll tell you if if you're if you need to go back and try again. What helps is either define this planning or local session, or just use the Drake plan or app so I can, I can paste a plan inside here and it'll give me warnings if if if it needs to but you can see what this plan already looks like once you paste it in, and it allows you to iterate on plans maybe I go back maybe I change something I change these unit parameters update and maybe I have fewer activation functions and it'll have fewer targets and yeah so I would go ahead and and and log into these into these apps and work through the exercises. Again this is another long one so I'll check back every 10 minutes about timing, but we can we can spend quite a bit of time here and it'll be well worth it. Okay, how are we on the exercises for static branching. Are we about halfway through third of the way. That's good. Three to a half. Yeah we're doing really well on time overall so we can spend, we can spend quite a bit here. The external symbols is halfway some connection issues. Okay, so at external symbols seems to be pretty common. Yeah. Yeah, if you have questions. Yeah, still be happy to take them. It sounds like the to make sure it sounds like the the app with the static branching exercises maybe having some trouble. So, I guess, as there is a way that you can launch these apps locally from your cloud work spaces if this gets to be too much of a problem I would say usually you can. In most cases you can probably reset your just refresh the browser and it's it's probably going to solve it for that exercise it really shouldn't take more than a second or two to to refresh up to to run the the solutions. But you can you can do. So that this this workshop is all part of a package called learn Drake. And if you go to your cloud workspace which I'm sharing now, you can actually do something like learn Drake. Yeah, I called it launch app. And the app that you want to supply is learn great static. It's an our markdown learn our tutorial. It, your browser may block up pop up but if you, if you go into, if you're using Chrome, you can select to allow it, and then click try again in this dialogue box. And that's actually hosting the app locally from the from your from your cloud instance. And so all these exercises should still should still work if the if the app itself has trouble. Then when you're done if you if you hit stop you can disconnect that server. Yeah, so just please keep me posted on on what's going on with the apps if you're if you're having trouble. And yeah, especially please tell me if it doesn't resolve with a simple refresh if you're if you're stuck. I've got to at least one more work around. In addition to running locally from the from your cloud instance. So yeah just tell me if the problem is if the problem is is quick and it and it resolves to the refresh that's one thing but if it's if it keeps if it keeps persisting and never. Okay, so for at least one person. Reload did not work, starting over did. And the questions are independent so that's probably fine. But yeah, if these problems are insurmountable insurmountable then we can just try to host the app from different places. And it's hosted in two different places. I'd rather not use the second location, but if we have to, then, then that's fine. Just let me know. We got a question about using the object with the semicolon and without with and without an object with and without the semicolon in the transform or the exercises. Let me wait on that on that question and and point you to the exercise that would that would help transform equals. Okay. Right so there's a question of that if you're in the middle of a. Let's see, let me find those those questions so it'll be it'll be the clear. Okay, I think most of you are past this exercise so it's, I think it's okay to to give away the answer here. So, we just read the latest comments so I understand. Okay, I'll try to answer this first question. So, if you so, so the the Drake Drake plan is really, it's not exactly entirely pure R code. It's a domain specific language, which means it's it's its own language built on top of on top of what are is already doing and it has its own syntax and it has its own standards and the symbols and syntax. So if you were to write something like map over these symbols, what you get is, if you map over. If you define this target with this command sort of template, and instead of literally saying fun run you you loop through these, these, these symbols. There's a command here or each symbol takes its turn being replaced as the, as the, as the placeholder the grouping variable, and you get differently named targets because of that as well. And that's, that's a valid use of the static branching. If you were to do, and the, and that's, let's see. And because you're writing this literally as as symbols inside a match statement in in the the transform argument of target. It's going to interpret these as as symbols, because there's no there's no tidy valuation there's no bang bang. And so it's just going to to loop through these literal symbols here. If you were to larger map that exercise. If you were. These, these character strings are also language they are literally a bunch of characters instead of symbols and Drake interprets those as symbols and then and plugs those in to the actual argument at each at each iteration. So, as far as one to use a semicolon or one to not to answer why keep why the case question. You using us using the. I'm not sure I understand that. Did I, but please let me know if I didn't answer your question based on how I want to wear those last two, two exercises, I'm not, I'm, I'm still not sure I under understood that. But if you, if you post more in the in the chat. That would, that would help. I think I need a little bit of help understanding that question from Malcolm Barrett. Reading reading that question now. Yeah, so you can call it you can reference earlier variables inside the transport state form statement earlier in the plan, even if they're not actually targets, their variables that you mapped over previously so, for example, in one of the exercises, you can sort of in the last exercise, for example, is a good is a good place where this comes up, but the combined example. Well, let's see, you can do things like like this and I am. I'm improvising here so it may not work out. Exactly. Just bear with me a second so if you define a plan like this. And you want to add a new target, let's say, called summary. And you want to map over all the runs. You want to summarize them. And you also want to keep track of for some reason we want to keep track of the function that came along with it so let's say, we want to get the name of that function. Well, and you want to say transform equals map. Let's map over run. Let me see if I can diagnose this. This downstream target should substitute the functions that you use for the run target in for this symbol fun run here so yeah that's exactly what's happening here. So the summary of good run is going to get this this function symbol. And by the way, if you run D par substitute, that's just going to get whatever is inside here and return as a character vector. So it's a nice way to make a note of the function that called that was called previously in this case. So, and yes, so what Drake does is it keeps track of. It keeps track of the grouping variables that you previously used to define upstream targets here, and a way to demonstrate how this works is with the trade arguments of the of the plan so of Drake plan so if you call you call ordinary Drake plan. So it's just going to give you the target and the command and values for custom columns you may assign. But if you say trace equal to true, and then run that it will show you the values of the grouping variables that get built up over the course of static branching so fun run is a grouping variable with these values corresponding to each target and so is summary actually because it becomes a grouping variable that you're if you if you map over something. So this is sort of a way to make that static branching process less mysterious and more concrete. And this is, this is how the, the targets that you define in static branching can use those grouping variables upstream. If you in the second part of Malcolm's question is there a way to see what Drake knows about non target. And I think the trace answers. I think that the trace, the trace and Drake plan exposes what Drake knows about non target objects like fun run and act since they don't appear in the plan. That is what the traces for. And I know there another comment saying when you try fun run equals character vector that doesn't work. It's not using regex. It's not using text analysis earlier versions of Drake interfaces relied on text to wild cards and that was a little bit more brittle. It's it's hard to, it's hard to make a clean transition from text manipulation to valid our language. So it's this is why it's static code analysis and manipulation of expressions and language objects becomes a little bit more appropriate for this use case. So if you go to try character vectors for each of these, then what happens here is Drake expects those to be because this does work in a way but it's still, it's still not recommended. Okay, and general trouble on the external symbols, the external symbols question, let's revisit that. Okay, so we want to make a plan that looks like this. Okay, yes, this is the question we just we just visited so we have a domain specific language. The, what we're trying to loop over here is different functions and those functions need to be symbols. And so if you write, if you write those symbols, you could do you could do a vector notation or listen notation and it'll be interpreted the same. Essentially, you say, you name all the functions. And you name the symbol that they're getting substituted in for which case, which in this case is called fun run. And this is going to create a plan with those symbols and serve a bit counterintuitive but it is, it is nice in in serious use cases where where you're defining a bunch of functions for different methods. You're defining entirely different code base basis for different models because we change one set of functions, and it's only going to invalidate one set of models and the other set of models are going to stay up to date. And that's, that's a, that can, that's a powerful use case for for defining different different kinds of functions at the risk of repeating a little bit of code, but not repeating as much documentation. And so this, this situation is likely to come up in practice and to avoid unnecessarily rebuilding targets and to save time. Now let me know if that if that helps or doesn't help. I'm happy to elaborate on that. Okay, how, how does our language seems work. Okay, that's what art what that's a great question. So I type these in literally what you could also do is something like define a variable that turns this character vector into symbols. So, into a list of symbols, and then supply that list of symbols directly in here. Now, I'm not mapping over literally the symbol function Sims, I want the value in this variable to be inserted into into the transform statement in the in the plan. So what I'm going to do is I'm going to use the tidy valuation bang bang operator, and this will, this will tell the plan to use the value in here and this evaluates to instead of just the, the static symbol function Sims. What's coming on here is, is as follows so create a character vector called function names. I get the symbols. This is what we have. So, we have this is, this is a language object it's a symbol, and you can do that in our you can compute on the on the language and a lot of different ways and that's that's part of how Drake plan works. So, you have a symbol like, if you're to say run, if you're to type in some our code like for example run model, and you'd have some hyper parameters are usually tries to evaluate this right away. But what part of Drake does is it allows you to define code up front and delay the evaluation until the right moment. And one way you can do this an experiment for yourself is with the quote function in our. And what this allows you to do is define language objects what happily calls expressions. So, this is just some, this is an art. This is a language object in our. So, and you can do different things with it so is actually a call object so the first element is going to be a symbol for the function. You can even turn this and it's, it's just two elements really. And yeah it's a language objects it's it's a subclass is a it's a it's class is is call. And these objects are really useful for defining domain specific languages like what Drake plan uses, and the all the all these transforms are are ways to to use language to define large numbers of targets. Static code analysis and manipulation of language objects and metaprogramming general. They're a bit involved, but they can they can help in in some of these use cases. Hopefully this doesn't require to advance knowledge of metaprogramming. I think the the exercises cover the cover most of the use cases that static branching is going to be for. Other questions is are there is there anything else that that you would like to talk about on on the subject of static branching and is there anyone who needs more time. Okay, so got at least one person's done useful person is willing to move on. So if we do move on, then there's actually what we're going to go to next is an alternative to static branching that came later, which it's it's the more dynamic version of this is called, well it's it's just called dynamic branching and it's it's in a lot of ways easier to use than static branching it's just it's historically later in Drake's development. And it's because it's easier to use people can pick it up more quickly on their own which is why I put it later. But let's let's go to that now. I'm willing to return to questions about static branching course but in the workshop. Let's go to let's go to static branching so we're going to go back let's go to dynamic branching rather. So back to our cloud workspace. Let's go back to, let's go to the section on, let's go to the five dash dynamic folder. Let's open the our project. And we're going to return to the to just working with the notebooks, the exercises are still interactive but not as high tech, going forward. And this notebook and it's just like the second notebook that we worked through when we were building up the plan there. I would go through and please read the, the pros and run the code chunks to go through the, the tutorial and when you get to questions that require custom code. There's a little bit of code to write. And you do have solutions in here if you're really stuck, you can ask me or you can have a peek at this. But there are there is a completed version available for you if you need it. And yeah this dynamic branching is all about what happens when you, when you want to define a whole collection of targets based on previous work. The targets that you define depend on the value of things, the values of things you computed previously. So if you, if you don't necessarily know what tuning parameters you're going to use, what, maybe, maybe you have this big long optimization in earlier targets suppose you're doing some, some kind of exploratory Gaussian process optimization to find the learning rates that are best for your, your deep neural nets. And that's, that's prior to the models that you define later. It's impossible static branching because in static branching you need to know exactly which targets you're going to run exactly which models you're going to run before you run the pipeline. But dynamic branching allows you to define targets, based on the values of dependencies. So, so you can as the pipeline is running, which turns out to be useful and was in a lot of situations, it can improve efficiency. It makes the graphs easier to read. So, Yeah, I would say after the after the exercises and static branching you may be pleased with how how easy this is to use and understand relies on less on metaprogramming and manipulation of language objects and more on direct computation on the actual values of targets which is, which is more intuitive to those people. The other the other thing is that static branching and dynamic branching can be used in the same plan together. I don't have examples in this tutorial in particular about how to, about how to do that, but static branching is kind of like a layer on top of dynamic branching So, you would, it's good practice in a lot of cases to define targets with both static and dynamic branching in them if you need it, but to, but to have the static branching sort of exist as a layer on top of dynamic branching. So, you define a bunch of dynamic targets using static branching and then those dynamic targets will then branch when you actually run the pipeline. Anyway, so we'll be working in this notebook for a while. And yeah, just keep you posted with questions that you have, and I'd be happy to work through issues as they come up. And I think it's a good time to check in and see where we are. If you're willing to share where done. Okay. That's great. Anyone else done or anyone else need more time. Yeah, don't, don't hesitate to view the solutions, especially if you get stuck. Okay, 95% that means means it's a good time to maybe wait a minute or two. Anyone who's who's done or not done any questions, surprises, issues. So I know towards the end, there's this bit about the trace of dynamic targets defining a defining a way to keep track of where those those sub targets came from which in our case which activation function contributed to which model. And it can be useful in some cases. In most cases, though, it's it's usually better for the sake of maintaining an organized project to to avoid the the trace and sort of work around it. The trace is great for when you're defining when you're defining objects where it's inconvenient or impossible to assign to assign attributes to the individual targets but actually most of the time. It's so it's great that the traces is in your toolkit if you if you have that you understand through the full capabilities of dynamic branching. But if you if you can, I would recommend that you instead define a data structure that's amenable to keeping track of its own metadata. So that's part of why we actually in this example, return data frames for model. I feel like what tiny models tries to what what the what the broom package tries to do for for various for for different types of commonly used models and in statistics and our it's if you return a data frame with not only the model results but the but the hyper parameters in this case or other information that contributed to it. It's actually easier to keep track of of where targets came from, then then defining this this trace variable. The traces useful when you actually do want to return model objects and you can't really assign. If you're returning the actual chaos model it's hard to assign a collection of attributes that describe some of the settings that you might have customized so. That's just just a little of a little bit of technique there in my own work I most often use the dynamic map. I find that in the dynamic group was was great to add because it allows this this dynamic. This dynamic version of through grouping by something like deep flyer group by. That's something that people have been requesting for a long time, but most almost all the time I the map salt the dynamic map solves most problems. Does anyone have any questions or issues. All right, in that case. Yeah, are there any objections to moving on some cells are stuck and won't fully evaluate which which sells specifically. I know this notebook takes a bit longer to evaluate than that other notebooks. So it sounds like a lot of people are done already. And since we're nearing the end of the workshop, I guess I'm more and more willing to to let the people who are done proceed ahead and the people still on the dynamic branching notebook can choose whether to to move on to the next phase or to keep working or to return to it later. So, so I guess to to make most of to make the most of our time in terms of the for the to make most of the most of our time for people who are who are done. You can go right ahead to the notebook on files. Actually, let's not open the project just yet. So, Drake has its own particular way of tracking external files. So, in Drake, you can, you can accept a data file and track it for changes as we saw with the customer turn dot CSV file and other exercises we explored what happens when we, we have a data target like that, we have a file that Drake tracks for changes like that and how, and how it propagates that change forward through downstream targets. And it can also track output files, and it can track our markdown reports. So you can actually have, you can actually reference upstream targets and our markdown report, and integrate in the the our markdown report or other kind of you have in inside the plan, and you can have a target to actually render that that report and have it have it re render automatically when targets change. So, the way to do that is is part of this set of exercises. And so I think this this comes up a lot the way to track files on the way to track output files and input files and our markdown reports. And so with the remaining time those of you who are especially those of you who are done with with dynamic branching and others who this this would be more useful. In practice, I would go ahead and open up the six dash files to our MD our markdown report and start working through the exercises as as usual. Let me just mention briefly verbally explicitly the the our markdown piece so for our markdown report. You'll have, you'll have calls to load and read inside active code chunks. And so what Drake is going to do is it's going to go through when you declare this report as well first of all if you if you run this this report after you run a Drake pipeline let's say you run everything with your model runs, and then you come to this report and you want to summarize your previous work. You can call, you can write this report and call load run, and you can knit this like you would any other document in the art studio ID window. But you can also and this is this is preferable for for Drake workflows, you can actually define it as a target in the plan, and this. You have a knitter in keyword that tells Drake to automatically detect those dependency relationships and and enforce this report step to depend on this run target because it was mentioned in this in this report. And it's, I use this all the time for for slide decks for for internal purposes, and for other kinds of presentations or for reports. And I think in general, it's something that Drake treats a little bit differently than in other use cases so it's common practice and statistics especially to use an our markdown report as this top level workflow manager so something that that just that you do everything inside this report. Drake workflows, because of the computational intensity, oftentimes too ambitious and too big for to fit it all in a single are marked out report. And our markdown isn't really designed to handle the heavy duty workflows that Drake is designed to handle. And so, I would recommend that our markdown reports do as little as possible. I would also note that you can already you can take advantage of the work that you already did when you ran these targets, and this allows the reports to actually when you run them they run quickly and you can iterate quickly on them go back and change the pros. I find this to be easier to use than knitters caching system. To be to be a lot more fit for purpose. That makes sense. Yeah, so in the remaining time feel free to proceed to the files notebook, and let me know a few issues. I want to say thanks everyone for coming this is this has been great. It's and I hope I hope that the workshop has helped your work. I hope that it's that you can take things things back that are useful to your your research your day job. So Alexia linked to a survey for the user 2020 you could fill that out when you get a chance that would be great. And I will be here for another half an hour or so to answer more questions. I see some more questions are coming in. I'll get to that. So, and after after this, this session is over, don't hesitate to reach out. Probably the best way to to track me down for issues that are Drake related is the issue tracker and Drake's development page. So, I'll link to that in soon. I know there are some maintainers who don't necessarily prefer github issue tracker for just questions and discussion and comments. I really like it because it's easy to share. github makes it easy to share code back and forth and even comment on closed issues. So even if I close an issue because I think the original question was already addressed and I'm more than happy to log back in and keep the discussion going and answer follow up questions that you have. And it's gratifying to see all this positive feedback from today. If you think that, you know, if you have lingering questions things that that could be changed. Yeah, don't hesitate to reach out. Yeah, like I said, I'll be here until the end of the hour. So back to the technical stuff Malcolm asked an excellent excellent question so when you're using packages from a function. Why does using a namespace function appear in the dependency graph but using the non namespace version of the function not appear. So, we're talking we're talking about the double colon operator package colon colon function so you could either you could either call. a namespace function like so namespace call looks something like wire filter, when you have that double colon, not namespaced, you need to load the library package first, and then call filter. So, in the former case, Drake notices these functions and, and tracks them watches them for changes puts them in the dependency graph. In the latter case, where these functions are defined in the package environment, not in your own environment they're not included in the dependency graph. That's, that's just a, this gets into into an area of, you know, a question is, you know, should should Drake track information and watch packages for changes or should it stay out of that whole business. And if I were to go back and develop Drake from the start all over again. I would say that it should just stay away from the business of tracking packages. What Drake does try to do for the most part is to track objects and functions that exist in your global environment that you define yourself. Either in the global environment or in a special environment if you, if you mess with Bryce, I shouldn't say if you take manual control of Drake's environment sentence. So, if filter were a function to find in your own environment and Drake would track it, but because it's part of deep pliers special package environment it does not track that unless you do this double colon. So Drake in its static code code analysis kind of does it kind of tries to do a little bit of both it kind of. So it, it doesn't look for it only looks for functions in your environment for most part but it makes an exception for these namespace calls these double colon calls, and honestly I kind of regret that now. I think that what Drake should be doing is defining a clean break between your custom code and the, the code that's that's in packages that are external, especially because the RN package has gotten so good. I mean it's so fast and it's it's so much nicer than pack rat and it provides an excellent way to to lock down the package environment of your project and to ensure package reproducibility in a way that this that increases the strength of your project rather than making it more brittle. So, short answer is Drake makes a special exception for these double colon namespace calls and everything else is all the other objects that attracts have to be defined in your in your environment. Very long story I know, but it's it's something I've thought a lot about it. I don't think a pipeline tool should should necessarily track packages, I think that packages should be reasonably permanent for projects and enforced, which is what the RN package does in fact I'll look up RN and post a link to it in the chat because it's such a good package for research. And I'm so glad to hear all this positive feedback coming in I'm so glad the workshop is useful to you and hands on. I'm hoping to I'm hoping this was was a concrete way to bring this to bring the things that Drake does to to really concrete down to earth level this. And it's easy to follow. And like I said you can continue where you left off. So, after, after your, if you want to, if you didn't finish everything in the, in the workshop, or you just want to. You might want to retake it recommend it to others take it on your own time. Learn Drake is a package with all the materials of this workshop. And what you can do is, you can write the notebooks with this save notebooks function, which I believe is referenced. Oh, here we go. You can save the notebooks that you just worked through. You can view the slides or launch one of the shiny apps using package functions. You can also if you have trouble installing TensorFlow, you can actually log into a publicly available cloud our studio cloud workspace. The reason I didn't use our studio cloud today is because there were I was expecting a lot of you to to log in at once. But on your own time, we don't have as many people slamming the server once and so Yeah, this is the recommended way to just go into our studio instance and have all the dependencies are installed. Oh, okay. So here, here we go. So there's another another question that just came up is, is what's up with a target package should try to be buying it to to update. So I kind of hesitate to talk about this in a workshop on on Drake, but let me just say first of all that Drake is here to stay and I'm always going to be developing it. It's always going to be there. I think the feature set of Drake is complete. So I'm going to focus on issues that provide maximum concrete value to users. It's been, it's gotten to the point where it's already very big and very mature and so there's, there's really no new features or major infrastructure developments that I'm that I'm planning for it. But I will I will take your requests and and resolve known bugs and known efficiencies and and talk about new features but Drake does have limitations and I've done all that I can all that I think I can to resolve the the limitations that are that were solvable in in Drake itself. The targets package is is is something that's brand new and it's under development. But it is it is what I it is a package that it's an alternative to Drake. And it's it tries to learn from Drake's successes and mistakes during development and it tries to be the successor that that Drake the Drake might have been had we had this experience and and these these learnings will be will be the end of it. So, yeah, I am excited for targets and I think it's going to do a lot of good in this in this pipeline space. And it is a long term successor to Drake, but it's still very early days. So, the targets package if you look in the documentation it's it's got its statement it's got a statement of need, describing the ways that it's that it overcomes some of some of Drake's limitations. But it's not going to have everything that Drake has it's not going to have your reproducible data recovery and it's not going to have history. It has different opinions of those things it's more of a of a minimal of a minimal tool that tries to make it possible for other for other tools like get to sort of pick up the slack. But like I said Drake is always going to be here. I'm going to still be here to answer your questions and yeah it's there there is there is a lot of overlap. And it is an awkward time because because targets is really it's it's sort of the next generation long long term. But both pipeline toolkits are good choices. So if you Drake has been around for a long time, if you really need something that's that's that's been in the community for longer that's been that it validated and peer reviewed, it has a rich feature set that go organically over time and includes things like history and data recovery then Drake is is a great option. If you're willing to live on the bleeding edge. If you want something that's a bit more storage efficient that has a bit better parallel efficiency. Then if you want dynamic branching that integrates better with the grouping over over data frames. If you want to avoid a lot of this metaprogramming that Drake requires sometimes with static branching and focus more on direct programming without doing specific languages or language objects then target the targets package might be good to check out it's not it's on my GitHub page. There is change ahead but you will never go you'll never go wrong with sticking with Drake. I hope that answers that that question without creating confusion but since since it was asked and I'm, I'm, I'm giving the full story here and trying to minimize confusion there's really. It's, it's, yeah, your existing tools and including this one all your favorite tools are going to continue to work and continue to exist I'm going to continue to support those the ones that I maintain. I was about to link to the RN package let me type this in the chat. RN visit is a package that that attempts to create a well does successfully create a reproducible package environments for your for your project. And it's, it does this a bit differently than the than the pack wrap package, which is a predecessor RN ensures reproducibility, while maintaining a global cash for each user of installed packages to to lighten the not only light and storage but light but but ensure that projects can initialize and and update their package environments for far more quickly. And aren't there's a great companion to Drake because Drake doesn't really dive into these packages but aren't that's our job and they work super well together. The good news for targets is that it's, it's a lot easier to for users to for new users to to understand, I think, and there's, there are more guardrails to prevent common pitfalls and in surprises. Documentation is a bit more concise, because we're learning from things that that a lot of the experimentation and learn experience from from Drake, went into targets, and see there was something else that I think I wanted to say about it but Yes, it's under development right now and it's it's at a state where people can use and try it out. And if you, if you know Drake, then you almost know targets, because the usages of the packages are very similar. And so if you're if you're willing to rerun things from scratch and start over. It's actually quite an easy transition from one package to another. And you get different syntax when you're when you're defining a targets pipeline versus a Drake plan. But other than that it's the thing the concepts that you learn in this workshop, especially programming with custom functions and keeping your keeping your workflows function oriented rather than script oriented and organizing defining things in terms of targets. That's all, that's most of the way they're already. And except for static branching which is, which is, which is different. It's, it's quite an easy adjustment actually. And yeah, like I said all the, all the, the, the concepts that they were introduced in this in this workshop carry over quite nicely in terms of the mental model. Okay, we're at the top of the hour. And that concludes the tutorial on reproducible computation at scale and are with Drake. Thank you everyone for coming. And this is not really the end because you can reach out with with questions and use cases. And take this, this workshop materials like I said are on, are online for you to, to share and finish up later if you if you choose. Thanks to the, the folks that use our and our ladies, Alexia and test others for for making this possible. Thank you for our studio, thank you to our studio for for providing infrastructure for the cloud workspaces and apps. And I've done this without, without all of you. And I hope that you that you came away with something useful you can apply to your, your daily work.