 Kia ora everyone, my name is Jordan Lima and I'm a PhD student here at the University of Otago in New Zealand. Last year I completed an undergraduate research project for my Bachelor of Biomedical Sciences Honours degree where I investigated novel CT DNA biomarkers for improved surveillance of colorectal cancer. I analysed the majority of my sequencing data through the Galaxy Australia database of bioinformatic tools and I've been asked to report back on how these tools can be executed intuitively and stitched together in workflow for wet lab researchers. First I'd like to acknowledge my supervisor Professor Perry Guilford for his support. I'd also like to acknowledge Dr Rob Day from here in the Centre for Translational Cancer Research for being the brains behind the bioinformatics on this project and many others. A brief background about circulating tumor DNA or CT DNA. CT DNA is DNA that leaks into the bloodstream from tumours. This DNA carries cancer specific alterations to distinguish it from healthy cell free DNA in the bloodstream. These cancer specific alterations or biomarkers allow detection and quantification of CT DNA that can provide patient specific measures of disease progression or relapse during and after treatment. The aim of my honours project was to identify novel CT DNA biomarkers in this gene called RNF 43. We chose RNF 43 for the similar role it plays in the cell as ABC, the most commonly mutated gene in colorectal cancers and because mutations in these genes are shown to be mutually exclusive. These biomarkers needed to be somatic, likely pathogenic and likely found in mutations meaning they occur early in tumor development so they would not likely be lost to eventual tumor evolution. As this was a one-year project I first needed to develop a novel sequencing protocol for RNF 43 that maximised time and cost efficiencies. This was especially important when our lab time and resources took a hit from the COVID-19 pandemic. We started by sequencing the coding regions of RNF 43 from 39 patient samples which generated large amounts of my next generation sequencing data on the Illumina Base Space platform. The problem I encountered when trying to analyse this raw sequencing data was that the base space software options were pretty inflexible and quite cumbersome for analysing custom amplicons like we had generated for RNF 43. Base space also requires a paid membership for most use of sequence analysis tools. So Robin I did some digging and came across the Galaxy Australia database that contained intuitive accessible bioinformatic tools that were easy to use, came with our own sets of instructions and had a user-friendly interface that allowed dynamic changes to variant calling parameters. Hence we started to stitch together an investigative process to identify novel RNF 43 biomarkers. First I exported our PADN sequence data from base space into the Galaxy Australia database. I then used the FastQC, Trim Galore and FastP tools to analyse and improve the quality of our raw sequencing data. Then I used the Bowtie2 mapping tools to map each patient's reads against two different versions of the human reference genome and to generate a BAM output. Mapping to these two versions of the human genome was important for further analyses of any variants identified as I found different analyses software to have different version requirements. I also exported this BAM file to a map sequence analysis tool called SIGmonk where any sequence and gaps in RNF 43 could be visualised. I also used the BAM file to generate a pile up output in the SAM tools in pile up tool as this was the required format for the variant detection tool I used called Vascan. Vascan had a user-friendly interface that allowed for dynamic changes and parameters used for calling variants from patient sequence data. These parameters varied for SNPs and missense variants and were based off biological and statistical feasibility. Vascan identified a total of 77 RNF 43 variants that we could then validate via wet lab experiments. 71 of these were excluded by our controls and only one RNF 43 variant was validated as an ideal CTDMA biomarker by the end of our wet lab analysis. At the end of this project we had designed a novel sequencing protocol for RNF 43 that utilised bioinformatics tools in the Galaxy Australia database to optimise time and cost efficiencies. This protocol could also be used as a template for analysis of other potential CTDMA biomarkers and other genes of interest. This graph here shows an analysis of CTDMA levels in one of our co-lateral cancer patients using our novel RNF 43 variant showed in red. In comparison to other CTDMA biomarkers currently used on routine sequencing panels for co-lateral cancer surveillance, our variant seems to show similar if not better predictive value of disease status. So the final outcome of our project was to add this novel CTDMA biomarker to a commercial sequencing panel for improved surveillance of co-lateral cancer. It's important to note that prior to starting my honest project I had little to no understanding of bioinformatics analyses. I'm a heavily wet lab oriented student so it can be difficult for me and for other students like me to approach bioinformatics with a positive mindset. I understand the value of bioinformatics to analyse data or generate hypotheses before conducting wet lab investigations, but it can be difficult to find the drive to learn complex coding languages like pipeline R and Python and how to use them. Trust me, I've tried. It also isn't always a feasible option for short-term undergraduate projects like this one. However, I found Galaxy to be a useful resource for my initial engagement with computational tools and I think it has value for engagement of other undergraduate students because that promotes intuitive thinking and puts these tools into the context of your individual research project. I'm also more open to using computational tools again in my future analyses of CTDMA during my PhD. Thank you all for listening to my talk. I'm happy to answer any questions about my project in the Q&A sessions.