 So hello everyone, Maxime here. So first of all, I'd like to thanks like the chance you could back initiative to help us doing like these bite-sized talks. Today, I'm introducing like Rob Sime, Groovy Wizard, working like at Secura Labs. And he will explain us how to safely handle metadata. And as usual, you'll be able to unmute yourself at the end of the talk to ask your question. Of course, you can also ask them if you can slack on the chat. And over to you, no wrong. Perfect. Thanks for the intro, Maxime. So this talk is, as Maxime indicated, about safe metadata handling in Nexlo pipelines. But really, it's all about safely passing any objects through channels in Nexlo. But it has particular application for the NFCore community here, because we do a lot of metadata handling in NFCore, particularly passing around metamaps between processes and through channels. And there are some complicated little bugs that can occur when mutating those objects in place. What I'm going to say suggests to you is that you should never mutate those objects in place, but instead always return new objects. Of course, this is a bit hard to explain in the abstract. So let's do an example. And this example is based on a true story, a sad story. Let's take this workflow. And we have a channel that produces objects or emits an object. And we're passing that channel to a buy new shirt process. This process does something very simple. It just echoes the t-shirt size. And based on the weight property in that object being passed to the process. So when we run this, we see the t-shirt side is small, because the weight is 70 here. Pass it to buy new shirt. So that's what we'd expect. No surprises here. Nothing outrageously complicated. Yeah, these weights are in KLA still. So let's make this a little bit more complicated. I'm assuming that everyone has a pretty solid grasp of this. Let's add two new processes. The one we're going to be specifically interested in here is get new job. And this get new job process takes an object, emits that same object, and modifies that object in place. So it adds fire to the weight property. So we take that channel again, the same channel we created before. We pass it once to buy new shirt and it wants to get new job. This runs as expected. The t-shirt size is small. So everything is sort of as we expect. And I think the way that a lot of people think about these pipelines is like this. I think conceptually, a lot of us have this sort of picture in our head when we're writing these pipelines. We take an object, we pass it to buy new shirt, and we pass it to get new job. Pass the object into those two channels into new processes. And I think we have a tendency to think about those as independent events. But really, because this channel is taking the same object, it's exactly the same object in memory getting passed to get new job and buy new shirt. So if get new job modifies the properties of that, it will affect buying new shirt. In this case, we saw it didn't affect the output of the run because this is dependent on timing. It's an important understanding to remember that all of these processes happen asynchronously in next load. So we can offer no guarantees about the order or the timing of this. To give this, to make it a little bit more clear, let's add in a little delay. So I'm now going to add this browse process before buying new shirt. So this browse process, all it does is sleep for a couple of seconds. And that gives it time for the sort of, yeah, just delays before we buy the new shirt. So now we've just added a process here. And ideally, I think a lot of people conceptually would think that adding a process here shouldn't affect the outcome here because we've just passed the object through transparently. We don't do any modifications here, but now because this process, this brand new shirt process happens after the get new job process, the get new job process modifies the weight property and now my new shirt is changed. So now the t-shirt size is a medium. I'll just give people a second to look at that and just digest it. So the sort of really cool message that I want to get across here is that modifying the objects modifies it across all paths in the DAG, the directed acyclic graph, the sort of graph of processes, which can lead to sort of complicated and time-dependent bugs. We were seeing some problems in NFCore pipelines that would only appear if some processes took a little bit longer. And that's what I was trying to demonstrate with that browse process. They become really complicated time-dependent bugs to track down. Okay, so that's what the problem is. So what is the solution? And the solution is you should always return or whenever possible, return a new object instead of modifying objects in place. And there's a couple of very, just gonna introduce two very handy methods in Groovy for doing this, the sort of most common modifications we do in Metamap objects in NFCore, but the sort of the idea that you should always return new objects is the general solution for this problem. So this is what the process was before, the process that I showed you before that had the bug. So modifying the property in place, but we can do this instead. So instead of returning the me object or the sort of meta object, that map object transparently, oh, there's a bug here, that should be me and me. What we do is we create a new object and we add these two together and return that new object. This will fix the bug. It's important to note here that this plus operator is an alias for the dot plus method on maps in Groovy. And the dot plus, the dot plus method will return a new map with precedence being given to the map on the right. So this is the correct way. So yeah, so this dot plus method is really important. It's a way of sort of merging maps together in Groovy. This is what the Groovy doc looks like with the link to the bottom. So it returns a new map containing all the entries from the left and right giving precedence to the right. And that giving precedence to the right is important because it allows us to override properties by placing them on the map on the right-hand side of the plus operator. And critically, the return object is a new map containing all the entries from the left and right. Similarly, the same sort of problem happens inside of map blocks or map closures. This is actually the more common case in the NFCore pipelines. What we're doing here is we're making a same object, a map of two keys named weight. And here I'm overriding the weight property with a weight key in this map by adding a new map. And this will return a new map and make it clean for downstream use. In addition to overwrite using this plus operator to overwrite properties in this map, I can also use it to overwrite properties and add new properties. So here I'm adding a new key, the employer key. And as Max said, it works with secure labs in addition to overriding the property. So it's a really nice way of piecing together and adding and overriding maps in Nexlo pipelines. The sort of inverses operation you might think of is subtracting keys. This is also something that needs to happen quite a lot in NFCore pipelines where you wanna take a subset of the keys. Rather than defining a new map, you can use this sub map method. The sub map method takes a collection of keys and returns a new map just containing those keys. So here we have a very verbose map, first name, last name, location, age, and employer. And let's say for downstream processes, I really only need first name and location. I can use this sub map method, which does return a new object, doesn't modify the object in place, it returns a new object. You can see here in the return value in the documentation, in the groovy doc. So that will be safe for modification. One complicated example in the wild here was from the Sarac pipeline. This was a really tricky bug to spot. It's actually being, is now absolutely fixed in Sarac and it's also fixed on the next flow side. But here we were taking the output of the fast P process and we're taking the meta and the reads and we were calling the dot sort method on the reads. Now this is a dangerous operation because dot sort actually modifies the object in place, it sorts the object in place, even though we're returning, we're assigning it to a new value here, dot sort actually modifies reads, which had complicated implications for the publishing of those files. So that's what you want to avoid. But for the most part, 99% of cases can be avoided by simply using the dot plus method on map objects or the dot sub map method on objects for expanding and contracting meta map methods, meta map objects in NFCore pipelines. So today's talk is very simple, just has this one clear message, never modify anything in place and instead always return new objects when passing objects through NFCore and next flow pipelines. Have there any questions about that? I also have a VS code and we can go through examples interactively if people have more questions. Thank you very much Rob, that was like amazing. And yes, I'm hoping people, does anyone have any question here? I know we always have a question. I was afraid. This is super insightful and I am sure I've written a lot of code which falls into these tracks. How can we spot them? How do we spot these bugs? It's almost always, oops, let me just check that screen again. It's almost always this dot notation, for modifying, but you can actually, let me share my screen again. You can force it to be a little bit more clear. Actually, no, I'm not gonna share my screen because I don't have an example of that, but you can force it. So we could do one of two things. In NFCore pipelines, we could instead of passing a map object, we could develop our own class that inherits from map and simply make it immutable. That would sort of force things to, it wouldn't compile or not compile, it wouldn't run when you try to assign objects, try to modify it in place. That's one object, that's one path. That's a sort of more secure path, but it does make things a little less convenient because there are sometimes, yeah, when sort of creating an object where it's nice, but that might be one and sort of toying around with what it would take to do that, because I think it might be possible just to define a class in the lib directory and like a new metamap.groovy, change one sort of invocation at the start of the pipeline and leave everything else unchanged. So that might be the way we end up going, but I'm willing to like give it a go and just see what people think. I don't know how aware people are of the project, but I've just come off a call with Matias and Julia and we were talking about the NF validation plugin, which Julia and Nicholas are writing, which has got a sample sheet, a new channel factory for generating a channel output from a sample sheet. That's where the metamap is most likely gonna be created, right? At the point where we're passing the sample sheet. So it could be something we could do within that next-flow plugin. And then all the stuff, we don't have to touch the lib directory within the pipeline or anything we could generate and define that class within the next-flow plugin and hide it outside. Yeah, yeah, that's perfect. Yeah, that would work. Yeah, it's great idea. TZ asked a question about like, what was the best way to drop like just one key from the metamap? My suggestion was like to use minus and then sub map with just that one key. Yep, that's one option. Or you can pass in another option would be dot sub map and then it dot keys minus the key you don't want. So that sub map will take a collection of any kind. And if you take a map and call the dot keys method, that returns a set of all the keys. So you could subtract the key you don't want from that set and pass that it dot keys minus the key you don't want. But also the way that you've described their maximum is also perfectly fine. I think that'd be about the same number of characters anyway, so personal preference. Okay, thank you. Yeah, there isn't like a specific method for removing a key. But if we did create our own class, we could make our own method to remove a key. So that might, I might not be sure. I think creating like on class would be like something for unsurpassed size. Yeah, yeah, definitely. So, good. So is there a good source of documentation to read up on this anyway? This idea is a little bit next-low based. So no, but the groovy documents, the groovy docs that I linked in the slide and I'll pass a link on the Slack at the best place to groovy docs. I wonder if there's, I've forgotten who it was now. Is it the midnight here sort of dark? So someone in the community has built a website like a made up materials website with like common pitfalls and things for next. I wonder if it'd be a good one to go into that site. Oh yeah, I can suggest to add it there. Apologies for having this. I'm sure I'll be corrected in a second. Yeah, I think it was Moritz, Moritz Beaver. Also, we've got some of this stuff in the advanced training docs, Rob Simon. Yep, yeah, this is like a module in the advanced training. So yeah, we'll end up in training docs, Kira, or training.next-low. So I just had a question. So if you did want to sort, say, how do you read as part of an operation for whatever reason, without actually appending or taking away from the map, what's the safest way then to just create a new copy with use.clone, would you reassign it? I mean, is reassigning safe? So if I have, because is it just a point in memory then where you would be updating the original map anyway, you could just say new map is equal to old map, for example. You can use, it requires, it's a really simple fix. All it requires is in.sort, you pass it a true as the first argument, and that will sort it, instead of sorting in place, we'll return a new object and then you can reassign it. So yeah. I'm not sure what you mean. So if you create a new map, this can be an issue in some programming languages where you have an original map, it's an old map, and say you want to create a copy of it, you'd say new map is equal to old map, right? And then you do all of your downstream operations on new map. That's still change old map in place because it's just a point in memory to the old map, see what I mean? Yeah, okay. So if you pass.sort a true, it will return a new map. So the new map is a new object. The elements of the map will still be pointed to the same original copies, but that's okay because it's the order that you want to change. So you'll get a new map and that object will have a new address in memory. Okay, I think we have like some more time. Does anyone have any more questions? Oh, Rike has just said she didn't know that exact. So that exact method that I used, I used as a convenient methods, the important thing to know about exact is that happens on the next low head node rather than being farmed out to a process, which can be useful because particularly if you're operating in the cloud, you don't have to wait for VM speedups. Also good for demos like this because you can just write arbitrary groovy. It's not that helpful. It's like in Mexico pipelines, more often than not, you don't need it. Sorry Phil. I was gonna say, I usually am used to going as far as saying don't use it because it's quite easy to abuse it and then like crash the main next way job. We'll take down the headband. Yeah, good for demos and maybe not best practice. Okay, then I think we are good. So I will stop the recording.