 Hi, my name is Scott Chamberlain. I work for ARP and SI, and my Twitter is right there. Today I'm going to talk about state puffed object validation and serialization. And in addition, should this even be a package? So one of the pain points in programming is serialization, converting data in one format to another format. And it's especially painful when those data are complex. Other languages have good ideas, right? So if we're confronted with the problem on the previous slide, what kind of solutions are already out there? One place to look is in the language that we're talking about, R. But in addition, similar language is like Python. There's a package in Python that's quite widely used called Marshmallow. Sort of it addresses this exact problem for serializing different types of data, different serializing data into different formats. And a great example is, so at the top of the script we're importing various things from Marshmallow. We're defining two classes, an artist schema and an album schema, where the album can have many artists within it. So here we define a Bowie as a dict, and so David Bowie is the sort of the artist. And then an album is a dictionary with a number of fields, including with an artist nested within it. And then we can define a new object from the album schema called schema. And then serialize that album, that dictionary, into, in this case, basically just a dictionary, even though it started as a dictionary, but it's validating the data that was passed into it. And then at the bottom of the script we can see, in this case, we've given a date as a string, whereas it's really expecting it to be a date, as you can see in the album schema defined above. And in this case, there's a validation error, because we passed the wrong type of thing that was expected. So coming back to R, there's similar-ish stuff in R. There's the Assertor package, which is actually an R open-side package as well. Validate and error locate. I don't think any of these really quite address the problems that we're sort of, we're gonna be talking about here. But let me know if I've missed any others. So you can get the, check out the package that we're gonna talk about here at R open-side slash state puffed on GitHub. So an example of state puffed, if we think back to that example we just did, this is somewhat similar one. So we load up state puffed, define a schema. This is R6 at the top here. So it's might be somewhat unfamiliar if you haven't used it before, but so we define a schema and then we define three fields, just like we did before in the Marshmallow example. So we have a number with num, a UUID, and then this field called foo. And so the first one is we want to be an integer, the second a UUID, and the third a boolean. And so when we create a list, which is sort of our equivalent to a named list, which is our equivalent to like a dictionary and Python. And then the first is num, UUID, and foo, and we have those expected values. And then we pass it to my schema and we dump JSON and we get that JSON back and the data is validated. So all the data was good in that first case, but in the second case, let's say we reassign UUID to foo-bar, and then we call load on my schema again and then it's not a value UID, so it errors with the validation error. And then we can do the same thing with a boolean where foo is expecting to be false or true, but we gave it a string called stuff and then we get a validation error on that. So in another use case, we can convert each sort of objects to an S3 class. So this is a pretty useful feature, I think, where we define a schema like we did in the second example, but you can see that there's this additional post-load line here where we actually define a function where we wanna take, instead of outputting like we did in the previous example, just a list or JSON, we wanna actually create an S3 class from the input that the user puts in. And so, and then we define a print method for that S3 class, print.artist, and then we can define a list of lists, a list of named lists specifically, and then we can convert that to JSON and then pass that into the schema. So this use case is sort of representing common use case probably and a lot of users of R where you get some JSON and you have to serialize that into something in R. And so here we load that JSON into our schema load JSON function and then we can get back these S3 classes. So why, one of the use cases that StatePust would be used for, I think data validation is a pretty obvious one, lots of potential users in that use case. Remote data sources can change often. So, there's a lot of packages that are pulling in R, that are pulling in data from APIs or FTP servers or whatnot, so scraping potentially. And so that's a, again, data validation step. Recent scripts would be pretty important, help raise issues with scripts as time goes on and maybe inputs change. And then potentially, there's an increasing number of people using R with Plumber to make actual APIs serving data from R itself and this would be a good use case for that. To do, there's nested data is one of actually the main things I wanted to create StatePust for. It works somewhat, but it needs a lot more testing. If you get to add more custom field types that are specific to the research and science domains, some examples that we have are URL and email, but it'd be great to add other ones and then I wanna add support for user defined fields. So, sort of whatever the user may need and wanna validate and then probably add a sort of easier to use interface that's not really, it's not so much R6 focused. But, should this even be a package? Let's walk through that. When should I not make a package? And I'm not talking about, here I'm just talking about me personally, I'm not trying to talk about anybody else in particular at all, but just talk about my experiences and my thought processes when I'm going through making a package. So, does a package solve actual use cases, right? Is there significant overlap with the existing solutions? If there isn't, then great, if there is, maybe that package hasn't been touched in 10 years and then there's a good reason to create a new thing that does, even if it's very similar, right? And then, or there, you can't look at a single package in a vacuum, right? Are there, what else is going on in your career? Are there higher priority things that need to be worked on with the lower hanging fruit that are also could be fruitful? So, these are just a couple of the considerations. For use cases, I think I had talked a little bit about earlier slides, I think there's a lot of use cases. Everybody, a lot of people are validating data or serializing an object from one thing to another. So, I think that it's pretty clear that the use cases are pretty solid for this package. And, you know, I'm definitely not against silliness. You know, cow say is something that me and a number of other people have created. And that's not, I wouldn't say that's solved in real world use cases, but it's definitely fun. And then the elephant in the room that some of you have probably been thinking, looking at their examples so far as, well, you know, are you just recreating S4? And I've definitely thought about that and considered it, but I don't think it's, I think it's sort of like recreating some of S4, but then, you know, with additional features and it's trying to solve specific use cases that are not, I think I think you've gone beyond what S4 does. As far as a quick example of what you can set, create a class called BMI, and you can say like what fields you want to be in this object, weight and size, and you can say what types of fields those are supposed to be, and then you can create a field and then it'll validate that input. So, that's pretty similar to some of the data validation steps we were talking about earlier with StateBufft. But, like I said, I think StateBuff use cases are sufficiently different. And so, in terms of higher priority, lower hanging fruit, I've got many other packages. So, you know, it's taken it probably a year and a half since I've started working on this, started working on StateBufft. And so just work, you know, there's lots of work to do on other packages. So this has sort of taken some time to get to even where it is now. And then, you know, the other packages I work on have many users and they're reporting bugs and, you know, so there's a lot of existing sort of maintenance and other packages. So this one's taken a while to get to. And then there's this last bullet is about like, how to determine, you know, what kind of impact the package is gonna have and does that, I mean that potentially influences your decision to work on a package, right? You know, if you think the package is gonna be really helpful to some people or especially a lot of people, then that would make you more willing to probably work on that package, right? So, but it's sometimes hard to judge that. I think one way I think about potential impact is, you know, are the potential number of users pretty large. And I think that's the case in this case of this package. So, but we'll see. So, the step up feature is pretty unclear, but if you're interested, the repos at arpensize slash stay puffed and the slides and PDF version is at scottotox.info slash stay puffed. Thanks.