 Thank you everybody. We're going to just do the wrap up now for day two of the October 2022 hackathon. Thank you everyone for all of your hard work today. There's been lots of questions and discussions that's been great to see and I think the poor requests are well and truly flowing in now. So I hope everyone's having a good time here both in person and online and thank you for all of your contributions so far. Mae'r rhan o'r hyn o'r ffordd o'r dda i yw ymddangos y maen nhw pwy o ddechrau i dyn ni, sy'n dwy ffrindio iawn ymddangos gyda IWS, ac mae'r ffordd o'r ffordd o'r pwy o'r pwy o'r pwysig, yn i'r bwysig i'r buslolaeth. Mae'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r gyrfa o'r dynnu ymddangos ymddangos, mae'r ffordd o'r ffordd o'r ffordd o'r bwysig i dynnu. I'm not going to say much more than that. We will do a group photo of everyone who came to the hackathon just before everything kicks off. So try and make sure you get to the restaurant kind of fairly promptly for eight o'clock because the photographers will be there and we'll do that photo soon after. Also, there's going to be like an introduction and stuff. So just try not to be late because you'll miss over kind of kick off. A bunch of us will be walking from here from Topic Glorious. So if you want to walk with us, meet us in the foyer downstairs at half past seven and we're going to aim to leave pretty promptly at half past seven so we'll be milling around for 10 minutes beforehand. So just from where you come in through the main doors. And if you would rather kind of go home to your hotel first and meet us straight there, that's totally fine. Just try and be there by around eight. We're not going to let us in if you're early though. Just don't be super early. It's about half an hour walk from here. If you're going by yourself, there's quite a lot of public transport options and I sent the link to the Google Maps in Slack. So hopefully you'll be able to find it. If there's any problems, just spam Slack with lots of messages and hopefully someone will pick up on it and we'll be able to help you out. But it's quite a nice walk as we're going to go past several kind of Barcelona landmarks and stuff so it should be nice. Just bear in mind that Hackathon Dinner is we've kind of booked exactly the right number for the people who are here for the in-person hackathon. So please, just if you're not sure if you actually signed up for the in-person hackathon then check with Chris and me and we can check it up on the spreadsheet. Better, I'm sure it'll be fine. Okay, otherwise, the other point is just to remind everyone about tomorrow. So tomorrow is a half day again. So we will be starting off exactly the same as we did this morning, kind of turn up any time from nine o'clock. At 10 o'clock in the morning, we all have a social. We'll be doing a quiz and it's kind of a fun, light-hearted quiz. It's going to run on an online platform called Kahoot. So this is for everyone, both here in person and online. And we will run it through Gather Town. So the way it will work is basically, we'll put Gather Town up on the TVs here for people in person and then you use your phone to put in the answers and the question comes up on the screen and for everyone on Gather Town you'll see. I think it's Fran who's going to be running it. She'll be stood at the front of the room sharing her screen so you'll see exactly the same as us and there'll be no time delay, which should be perfectly fair. So that will be about 10 o'clock and then we'll do our final kind of panic, wrapping up, trying to get everything done and we'll have a wrap up talk at 2. Thank you. Someone in the audience has been looking at the schedule. I think we have lunch at 12 or 1 and then we have final wrap up at 2. Once we're finished here, we're going to move out of these rooms completely. So the summit will happen downstairs in Tawr Glorious so it's in the minus one floor. So you come in through the main doors and then on the 4A there's some steps that go downstairs and that's where the summit will be. So after we wrap up, please clear everything out and we need to leave all these rooms as tidy as possible. So if you have a little bit of time to give us a hand that would be really helpful. And we just basically go straight downstairs and the first talk is at 5 p.m. so you've got a couple of hours there. Any questions that we think clear? That would be nice. A question was, will there be a chance to go upstairs to the top of a tower to get a view from the top? I think Maxine might have tried at one point. Seems to have been exploring. No, unfortunately not. It's pretty good out the window, so, you know. Yep, anything else? Cool. In that case, let's kick off and we will start going through all the different groups. So he wants to go first. Documentation, you guys? Did he kick us off well yesterday? Ha ha ha ha. You have his line already? Yep. Yep. I'm doing it for everyone this time. Hello everyone. So again, another day. I hope you guys had fun. I'm going to tell you a bit about what you did today for the documentation session. So from the very beginning, I repeat again, you guys are free to change groups during the hackathon. Tomorrow, if you want to join us in the documentation group, you're all welcome. The idea was to fix typos, try to improve the wording, rewrite some sessions and things like that. The team is co-lead by me, Fran, remotely, and Abinav. We had a lot of people joining today, yesterday. So we have about 15 members, both online and in-person. Some people went to different groups who came from different groups. So I think it's very nice that we have this integration mostly for networking during the hackathons. Very nice to meet everyone else. So for the second day, these are the number of requests that we merged from these people. So Louise Lounaisette, which is remote, was the one that created the most mergeable request. That's very nice. Then we had Llewelyn, who rewrote a lot of parts and a lot of other people that contributed with a lot of requests. So Fran is remotely, but it's also working on a few things. And one of the things she was doing is doing the transcripts for the biases that were recorded. And in the end, she sent us a list of very funny transcriptions from the software and we didn't put all of them. Some of them are a bit heavy on the joke, but there are a few here, like Philly Wales and Hershield Battle. Statistic. And if core, nfl, nf4, then of course, nfok, nfcald, unscore. Oh my God. Thank you, everyone. Who wants to go next? Pipelines. All right, thank you. Let me skip the introduction slides. Okay, yeah. So today we again worked on lots of pipelines. I will go through one, each of them on my one now. And also I think we grew actually in team members, which is great. Yeah, okay, day two. So first of all, we have a bunch of new pipelines proposed during the hackathon or that were proposed before, but now started to work on the differential abundance, light sheet, recon, semi-seq, tau typing and viral integration. So first the differential abundance pipeline is a pipeline for differential expression analysis, mostly for RNA seek data. And Oscar today, I think started working on a workflow diagram on paper and is digitising it now in HackMD. So I think a good first step for the subway map. Then we have the light sheet recon that Conrad worked on and he started working on the template and his first sub-workflows to be added. And it's a pipeline for light sheet microscopy imaging reconstruction. Then we have semi-seq, that was proposed a while ago. A Margarita is working on it. It's a pipeline for sequential analysis of macromolecule accessibility sequencing data. And she also started with a template today. And then tau typing with Chanta, Aaron, Josh and Sam worked on for identifying genes and genomic sequence with genome-wide phylogenetic signal of an organism using candle tau rank correlation statistics. And I think I also mentioned that they worked on templates. Sorry. Oh, God. Okay. Moving on. And we have some people on our list that are working on the viral integration pipeline to find viral integration sites within the human genome. Okay, then we have a pipeline for DSL2 conversion, which is EGAR, the James and Jack are working on. Currently EGAR is the best practice pipeline for genomics and metagenomics of ancient DNA data. And then we have a lot of pipelines where various developmental things are happening. So, Zarek, the German Somatic Varying Calling Pipeline. We've been working on various bug fixes, and Maxim and me have a little race about who gets to merge first and doesn't have to deal with merge conflicts. Then we have the liver CT analysis pipeline for quantitative image analysis of abdominal CT scans and hepatocelliocasinoma patients that Aaron and Louisa are working on on the DECOM input and also cleaning up the code base and transferring work between their branch and then, of course, branch. Now, Frank's scan, which is the pipeline for screening antimicrobial resistant genes, antimicrobial peptides and biosynthetic gene clusters. James, Meen and Louisa are working on different modules for BGC, summary tables, and AMP, COMB. I'm in the SM RNA-seq pipeline. Rob and Alex have been working on merging the FASP adapter detection, sample sheet handling, and then smaller issues in preparation for release. We have TAX profiler for taxonomic profiling across multiple databases of shotgun metagenomics. James and Sophia and Vladimir have been working on it and fixing biocona recipes and documentation and then adding, I think, other tools to TAX pasta and then also removing patches from their NFCOR modules in the pipeline and preparing more modules so a SMASH gather. Now, we have multiple people doing a lot of work on the single cell RNA-seq pipeline, so Allison has been working on the cell count and sequencing center parameters. Oliver Maersbyr working on Pytest workflows. Paul and Christian have been working on adding annotations, then various issue cleanups by Vivian, Oliver and Alex, and I'm hoping I pronounce everyone's names correctly here. And then we have the HTC pipeline for investigating horizontal gene transfer from NGS data, and Simon has been merging multiple pull requests today for TAX ID parameters and documentation. The ProteanFold pipeline for protein structure prediction and Athanasios is updating and I guess reconstructing the modules from the latest module restructuring thing from last week. And it's awesome in preparing PRs for AlphaFold2 or ColabFold databases and parameters. And then last but not least, we have the airflow pipeline that Zaba Gisela and Susanna are working on for pipeline for BNT cell sequencing analysis and they've been working on various modules and mould containers today. I think that's it, yeah. Just a quick note, your slide there about waiting for CI tests reminded me I think a lot of people have been coming up against that today that you put in a pull request and all the GitHub continuous integration tests so just sat there like orange is just waiting to be run. So we don't have a base level organisation, but so I think we have about 50 or 60 concurrent runs that we can have running at any one time before they just get sat in a queue, which normally is fine. And then we have a hackathon and it's a bit difficult. For future events, I'm still working on trying to get a temporary bump during events, I've almost made it this time. So hopefully for the next time we'll get a higher concurrency limit. But for now, just bear in mind if you push, if your tests fail and then other tests are still running, you can go in and click on the running tests and hit cancel. And especially if they're long running tests, that will really help everyone else because if that test runs for half an hour, if you cancel it five minutes in because you had a linting area and you're marked down file or something. So that would be really helpful. Just try and bear that in mind. And also if you push multiple commits and you have tests running on all of them, you can go into the older ones and cancel those. So just try and cancel any CI tests that you can just to help everyone else out. But obviously everything will run through the night now as well. So hopefully by tomorrow we'll start with a clean slate. Right, modules or sub-work flows, modules. Okay, thank you. So hi, I'm Jose. I'm colliding the modules team. So just this is from tomorrow. I found yesterday, sorry. So here the team members, so we are more and more. The team is lead by Yselam Lwiza Ammi. As Phil just mentioned, we are having programs with GitHub. So here we have a suggestion. And yes, we have been working, adapting the modules. So Christian has his work in simplifying. This is from yesterday, right? Okay. So I thought it was, oops. Probably because I didn't prepare the slide correctly. Okay, yes. Thank you. Okay, now you can see all the members of the team. We are a lot of them. Thank you. So yes, and we have been also working in adding a lot of new modules. We have listed all of them here. I had a small description of each of the modules in case you are interested for your pipeline or you are familiar with the noises it's performing. So maybe you want to step in and help developing the module or even review it and so on or use it in your pipeline. So Llewetha Annan, just mean have you working in this MP combi module, which is for antimicrobial peptid sparsing and functional classification tool. Lookout has been working in this FALCO module, which is a kind of as you see, but more computational efficient. So it's useful for nano-progrids. Solene, it's working in this SMOOF module, which is also an efficient tool to call a genotype astutoal variants. Gertrude and Marty are working on this PTIX map tool, which converts some or pre-formatted lead birds into genome-contact maps. Here they list what they have advanced today. Also, Robert has been working in the biohancel tool or module. And as you can see here, these two genotypes chrono-microbial whole genome sequence data. I've been working adding the area 2 module for protein fall. It's already merged. And area 2 is a lightweight multi-protocol, multi-source common line to download data. It supports several protocols. And it's very handy in the case of protein fall because we have to download very big chunks of data. Alan has been working in area 2. So the tool can work in just a single bump, two or three. He has implemented until three. It can also take more, but he told me that in real scenarios, you normally found this case. And area 2 is an assembly based realigners used to improve alignments and detect indels. Tony Hermoso has finished working on Kofam scan module. He has tested it with local data. And he has a pull request to integrate a test data set for this module. And the module will allow to perform keq orthology search. Saban Gisela have been working in adding this Presto filter seek module, as we just saw, the GitHub actions are taking time to run. So this is a common problem in most of these models. And in this case, it's two filter sequences in FASTA or FASQ files. Royer has been working in this proteomic module, which is called OpenMS ID filter. And in this case, these two will choose to filter peptide and protein identifications. And now, also, there are some people working, I think there is a CRISPR pipeline in development. And they have been also working on this CRISPR-2 module for this pipeline. And this model analyzed comes from genome editing experiments, so typically CRISPR. Then they have implemented this seek module. And it's waiting for discussion. So in case of this seek, it's in our package to perform differential analysis. So if you are interested or you are planning to use it, maybe you can jump in the discussion and give feedback. Then as a step two of this repeat-scoot module, and it's planning for doing the testing tomorrow. And in this case, this tool identifies repetitive substrings in the DNA. This is also related to CRISPR. This is one that Lauren has been working on. So it's for processing CRISPR data. With that here, I missed to put the request. And Maulik is still working on the model full-cloom genomics, which I'm not sure what's performing. So maybe you can take a look if you know about the tool. Yes, and from the module updates, we have not done too much. Only Louisa have update this in the fan scan pipeline. So that's all. OK. Last but not least, our workflows. Oh, yeah. An infrastructure. I was just going to say, I heard you guys burst out in spontaneous clapping earlier today. So I'm expecting great things. OK. So everybody stole my thunder. But that was us today as well. We were waiting on the CI runners. It was the majority of that. So Francesco has been working on the FG BioCreat UMI consensus and porting all of those sub-workflows into NFCOR modules and updating those modules. So he identified some new test data, implemented the UMI structure described in the paper as well, which is linked there. I guess everybody wants to go read it. The test data was also added in there. He did the FG BioZipper BAMs, the new module as well for call-de-plex consensus reads, and then also did the filter consensus reads PR. I believe they're under review and then submitted because we're waiting for the CI. And then Matthias did a FASCULA line as well. And we merged that this morning, had a big discussion on certain standards for sub-workflows and whether that's acceptable or not. Basically, whether we should have atomic sub-workflows, where we would just have just star align, for example, or if we'd also have one that does FASCULA line that aligns all of the various DNA aligners that you could use. So we decided to have both. If anyone has any other opinions on that, please feel free to open an issue on that. And then Camille was also working on the RNA sub-workflows. And we actually got one in this morning before the CI went down, which was the bed graph, bed clip, and the bed graph to big wig. That's just coming out of RNA-seq. And then we're also waiting on the CI for the BAMsort samples. Actually, I think this one was the one that was broken because of the bow tie. So we actually found other tests were broken. And we have to update those before we can update this one specifically because they depend on each other. And then this is the BAM D2 stats, SAM tools, and that's also coming out of RNA-seq. And again, we're waiting for the CI. And this is the FASCULA and FASCULA also by Camille. And that's also coming out of RNA-seq. And the BAM mark duplicates pickered. That's coming from David as well. And that's also out of RNA-seq. And then the pickered mark duplicates module. We also had to update that to handle cram input and make sure that we're testing for that because that wasn't there. And David's also taking care of that. And then Quinton is doing the bow tie to align subworkflow. And that's what's blocking some other things. So we're trying to get that in as well. And then lastly, I've been migrating the CI test on RNA-seq to PyTest workflow to convince Harshal to move over to that. So basically, we're only testing on the changes. We run all the tests on release for every single PyTest workflow. We're also only running on PRs. The MB5 sum is already calculated for the default workflow. So if anybody wants to calculate those for the rest of the workflows, that'd be great. And testing of local subworkflows is also implemented. Last thing that we're waiting on is the latest CI run. OK, last but not least. Yes, thank you very much. The infrastructure team, we got a few new people today. But Sofia worked on adding some landing pages for the usage and contributing docs so we can easily link to the grouping there. Nezran worked on improving the code coverage specifically for the sync one. So yay, more code tests. And Arthur made some small changes for the test YAML template where we had some problems with the contains command there or input there. Björn almost finished the NFCore subworkflow list command. We just need some more tests because of test coverage. Nicolae actually worked quite a lot with the NF test. And we also had a call in the afternoon where we discussed more how it will fit in with NFCore. And he has now a running subworkflow prototype. So testing a subworkflow with NF tests. He already had yesterday the module. So we're slowly expanding there. And he already opened to the NF test repo some issues with our inputs so we can align it closer to our current testing suite we have there. But it looks very nice what he has done there for now. Bruno keeps working on NF prof plugin. As a reminder, this is the one to trace the output of files from Nextflow. So he has now a contribution at Nextprocore, congrats. He polished PR to generate just a list of the published files and now is working on the JSON output and also aggregating task information and the run config. Okay, this is broke, but it's fine. Julia, Arthur and me worked on reducing the code redundancy for the tools because NFCore modules and subworkflows actually share a lot of commands and also logic. So we now created an abstraction layer for that. Julia was working on bugfixes and the install command, the modules install command, and also for installing subworkflows. And I fixed the broken linting in the dev branch twice. So please lint your markdown files before pushing them to dev. That was it. Thank you very much. Great. Did I forget any groups? Right, thank you very much, everyone. Feel free to carry on working here for a little bit if you want to. Some of us are going to go straight to the venue from here at 7.30. So I imagine there will be a few of us kicking around here until then. Otherwise, feel free to head off and drop bags at the hotels or whatever. Thank you very much for today and we'll see you tomorrow morning.