 Welcome everybody. This is the state of the galaxy 2021. This year we are doing things a little bit differently. There will be a lot of us talking. And this is because for many years galaxy has been a distributed community project and having just one talking hand would be strange and unfair. So today we'll talk about what's happening in the expanding galactic universe. We'll begin with James Taylor Foundation. This year James would have been 42. It's been a good year for Foundation. We have raised over $30,000 and I cannot express how grateful we are to hundreds of people who contributed to this effort. There will be a new dedicated website. It's about to be launched. And last year we provided support to 10 outstanding graduate students who attended Cold Spring Harbor Conference on Biological Data Science and this was a collaboration between the Galaxy Project and the Cold Spring Harbor Lab. This year we're doing things again and we're doing them in person. The Foundation will cover the full cost of the Genome Informatics Conference at Cold Spring Harbor to 10 outstanding students. The details on this will be upcoming. Hello everybody. Community engagement and collaboration have been distinguishing features of the Galaxy Project since it first began. Many community members have become regular contributors and collaborators. The process has always been very informal and as complexity in the project has grown it's become increasingly tricky for members to navigate and it's become increasingly tricky to manage in the absence of some sort of formal structures. So project governance has been restructured to make it as easy as possible for community members to locate, join and contribute to projects. This diagram that's suggested by Bjorn and Hans Rudolf and others shows that the executive is somewhat arbitrarily decided to divide project activities between a technical and a community board. Each board convenes and manages open largely self-organized subgroups where the plans are actually implemented. Coordination and strategic planning are a central component of the governance model in the form of published roadmap documents developed and implemented with as much community input and transparency as is possible. The technical working groups are very well established and they cover all of the major technical activities of the project and they offer a huge range of opportunities for contribution depending on your particular skills and experience. They're your opportunity to help sustain and improve the project. The community side of the GGSE is new. It's only been meeting for this year so working groups are still in the process of forming. The hub community and governance pages are the places to look and in terms of input we're looking for our first community consultations that are coming up as part of this meeting later. The roadmaps that we're trying to develop a high level descriptions of what we want to achieve and how we're going to do it and in what order we're going to do things. So community input is absolutely vital for this planning process, this new governance structure because we really need the roadmap to address the community's needs, the real community needs and desires and there'll be two open sessions, a birds of feather session and a co-fest and you are personally invited to come along and join in and help us to put the you in community. And a few words about the States of United Galaxy or what also is known as usegalaxy.star. This year we reached 10,000 publications that is 10,000 publications that site Galaxy and because we don't explicitly ask people to site us the actual number of publications that use Galaxy in some form is likely much higher. So this is for us it's truly a testament that Galaxy helps, it provides an ability for people to do all sorts of data analysis. And today we wanted to highlight one of the members of usegalaxy.star consortium and that is our Australian site at usegalaxy.org.au. We've had some really exciting developments in Galaxy Australia this year. We've been given some new resources, five new high memory large core count nodes dedicated to Galaxy, three of them are located here in Queensland, two of them in Melbourne and we've also been given access to 1,000 cores for a new pulsar nodes at NCI in Canberra and all of this new resources allowed us to support more and larger jobs for researchers. And the first job run on the high memory resources was a one and a half gigabase plant genome assembly which took less than a day to do. We've also spent the year engaging with the research communities. We've made a lot of new connections with new researchers. We've provided them with training and helped them with their workflows both in Australia and around the region. And thanks to our connections with other usegalaxy.star servers, we've installed many new community tool sets, including over 500 new tools. And we now support almost 14,000 users with 1,000 active per month and have processed nearly two and a half million jobs. COVID lockdown meant that Galaxy Australia staff couldn't move around. The Galaxy Australia did and is again. Galaxy Australia is moving to align our service with the Australian academic research network or ARNET, who have a track record of meeting the needs of data intensive research and science and beyond. And we are moving our core to a purpose-built OpenStack cloud to allow us to offer our users the stability and longevity that ARNET represents. And we've already connected into ARNET's research and storage in cloud store. And all of this work has been funded by the ARDC and Australian Biocommons. And this policy investment has really, really helped propel Galaxy Australia forward. Hello, I'm going to talk about a new service coming online next month, researcher.galaxyworks.io. Galaxyworks is a company that offers commercial services around Galaxy. The goal with Galaxyworks is to extend the benefits of Galaxy and OpenSource in general to biotech and pharma industries. With this new service that I'm going to talk about in a second, we want to support use of Galaxy at all levels. So the new service called Galaxy Pro Researcher is a first-ever subscription-based Galaxy service. Anybody can sign up. There will be no queue wait times, no quotas, have support for exotic hardware, including GPUs, their service level agreements, and enhanced security. You can stop by the poster tomorrow on the fifth floor to see a demo of this service. And the benefits for the Galaxy community are already evident from trying to set up this service. So we've contributed in a number of large Galaxy features over the last year or two, including the workflow invocation reports and vision of the Galaxy pages, allowing you to generate these beautiful paper-like documents in Galaxy that include Galaxy components. We added the UI and API for supporting best practice workflows, allowing you to upload older workflows and have them be automatically checked for aligning with best practices. We've added a lot of components to infrastructure management, including scaling support, added for a high availability database service, and support for GPUs. Some of these benefits are already being used by downstream Galaxy projects, such as the anvil that Mike will talk about in a second. And we've also contributed a dozen or so wrappers for various tools, again, some of which are being used in downstream projects, including the COVID analysis efforts with usegalaxy.start. Hi, everybody. I'm Saskia. And together, we lead the Galaxy Training Network together with Bérami Spadu from Freiburg. So one of the great new features of GTN this year is that the training materials are all available from within Galaxy. If you click on the graduation cap icon at the top of the Galaxy mouse tag, you'll have access to the entire library of GTN trainings. And we've really tried to improve this linkage between the training materials in Galaxy by providing new connections. So inside the training materials, when you're accessed through Galaxy, you'll not find little blue buttons every time we talk about a tool, which give you instant access to those tools directly inside Galaxy. So no more time wasted searching for all of the different tools that you need. This should make life a lot easier to switch back and forth between all the training materials and Galaxy. And we have a large number of training materials to choose from. You see some stats here that we have over 200 tutorials right now. And this is all community driven. So over 180 people from this great community have contributed these tutorials. And it's not just about biology anymore. And we have over 21 topics. We also have things like climate and ecology, machine learning. And it's not even only science. It's also technical topics. We also have lots of tutorials for developers and Galaxy administrators. So please come check it out. And recently we even have our first Spanish language tutorial. And all of this together went into a great event we hosted this year. Yes. This was an event we ran earlier this year. It was a global training event. Over 1200 people registered from 78 countries. It ran for five days. And we used an asynchronous format. So all the tutorials were pre-recorded. And participants could start, stop, and take breaks whenever they wanted because there was 24-7 support on Slack from over 16 instructors from the community. And everything afterwards remained online so people can still use it to learn. And this was such a great success and I had a great time organizing it together that we now hope to make this an annual event. Yeah. And in a similar style, we ran the Galaxy Admin Training earlier this year. It was the largest network Galaxy Admin Training, which was so cool. And we also ran the GCC 21 training week last week. This used all the content from the Sports Board and again, Galaxy Admin Training and more. We added a new developer track. And all this content is still online. You can see that YouTube together with lots of other cool content such as our regular webinar series. Now, if you want to help contribute to the Galaxy ecosystem, we have regular collaboration tests or co-fests as they're called. Every month, there's paper cuts day. And every three months, there is the Galaxy Training network co-fest. We'd also like to highlight a couple other contributions from our community members during the pandemic. There are a couple of great papers they wrote on how to use Galaxy for training and how to teach during lockdown and things like this. So if you want to hear more, please come to our talk. I believe it's on Thursday. It is. See you there. I would like to highlight two projects which represent two tails in the distribution of analysis that Galaxy can do. The first project is about handling very large numbers of relatively small data sets. This is mutation tracking in SARS-CoV-2. The other project, the Vertebra Genome Project or VGP, is about handling a relatively small number, but very large, long read data sets. For SARS-CoV-2, we developed a series of workflows that allow mutation detection from ampliconic or metatranscriptomic data. And these workflows utilize open high performance computing infrastructure which supports use Galaxy dot star instances in the US, in Europe and in Australia. So essentially what we're providing, we're providing an opportunity for anyone in the world to take any number of samples and analyze them using reliable open source tools. This is the only resource that we know of that allows that. Again, there are no any kinds of restrictions here. We use these workflows to process COVID-UK data, which is at this point the biggest aggregation of SARS-CoV-2 sequencing. And we display, we provide results of these analysis in several ways. So one of the ways in which we do this is we generate this observable dashboard, which allows a look at the samples we processed, restrict mutations based on their frequency, look at the lineages which are present in the data sets from a particular slice of, in this case, slice of time. You can see that we are, we're really good with processing the latest data sets and we're also doing, we're also processing historical data sets. So we, our goal is to process all data sets ever generated by within the COVID-UK data, COVID-UK effort. And these results of this analysis are provided through the dashboard, which I just showed, but you can also download them through our collaboration with viral beacon consortium. And the same workflows and the same dashboards can be used by institutions, countries, individual labs. And we are right now collaborating with a number of these, for example, Estonia starting to use our, our resources for SARS-CoV-2 data analysis. On the other hand, we have VGP. And this is our, our goal here is to provide, again, free, universally accessible system, which would allow assembly of large genomes. This is, again, the, the only one, the only resource of that kind. It's an open resource for long read analysis, essentially. So here you can use Hi-Fi reads from PegBio, BioNANA data sets, and Hi-C data to generate high quality genome assemblies. Hi, everyone. My name is Jeremy Gex. I'm a Galaxy PI located in the United States, currently at Oregon Health and Science University. It's a pleasure to talk to you today about two areas where Galaxy has really advanced over the last year in machine learning and imaging. Through an international collaboration, we've developed the Galaxy ML tool suite. This tool suite supports all aspects of machine learning in Galaxy from defining models to training on large data sets to evaluating model performance. You can do supervised or unsupervised machine learning with Galaxy ML, and there's even support for basic types of deep learning. There's a fantastic GCC talk all about Galaxy ML by Kaivan Kamali that I encourage you to check out to learn more. The second area that I want to highlight is image analysis in Galaxy. Being able to process imaging data sets is increasingly important because across biomedicine, it's exceptionally easy these days to capture large images, and it tells us a lot about biology that we can't get in other ways through traditional sequencing approaches. And so here's an example from the University of Freiburg research group where they've implemented live cell tracking in Galaxy. What you see here is the raw imaging on the left, and through image analysis, you can identify the cells and at what stage they are in their process of dividing during mitosis. There's a whole GCC talk by Beatrice Serrano-Selano on analysis of microscopy data that has more details on this and other examples. Another instance of image analysis comes from my research group where we're working with the National Cancer Institute in the Human Tumor Atlas project. The goal of this project is to understand how tumors evolve over time. And as part of this, we have to analyze this multiplex tissue imaging data sets. What these data sets tell you is by processing a slice of a tumor, you can identify where tumor cells are, which are in red here, versus the immune cells in yellow and the stromal cells in green. The proportions of cells that are present in a tumor as well as their spatial location has implications for response to treatment and ultimately survival times. And the processing of these data sets is made possible by using many different features of Galaxy, from collections to Kubernetes and the GVL to Docker and Conden, interactive tools and so on. I'm really excited by these applications of Galaxy. I'm happy to talk to you more about them if you're interested. Thank you. Hello, my name is Michael Schatz and I'm one of the PIs for Galaxy at Johns Hopkins University. I would like to tell you about some of the latest work with Galaxy within Anvil. Anvil is the NHGRI Genomic Data Science Analysis Visualization and Informatics Lab Space. Like Galaxy, Anvil is a cloud-based platform for large-scale biomedical analysis. One of the major goals of Anvil is to support collaborative analysis of several of the largest NHGRI projects investigating human diversity and the genetic contributions to many major diseases. Through these projects, a variety of genomics data sets are now available in the Anvil for more than 280,000 human subjects, with many more to be loaded in the near future. The Anvil is a federated cloud system composed of several major applications, including TeraGen3, DoxSurg, Jupyter, and R-Bio-Connector. Last but not least, Galaxy is a major component for Anvil for accessible, reproducible, and transparent research. We have designed the Anvil in this way in response to the huge amounts of data we're facing. There is simply no other way to effectively share and analyze data at these scales. We have integrated Galaxy and Anvil leveraging the new automated Kubernetes infrastructure and the remote data capabilities that are being presented at this conference this week by Alex, Nuan, John, and several others. I encourage you to check out their posters and presentations for more details. This work brings many new features to both Anvil and Galaxy users. Within Anvil, this brings all the features of Galaxy along with our large community of users, contributors, and our worldwide training network. Simultaneously, this deployment within Anvil also brings many new frontiers to Galaxy users. Number one is access to hundreds of thousands of protected human data sets in a FedRAMP-certified ecosystem. Also working in a cloud environment, users also have several new exciting capabilities, such as avoiding data downloads of these huge data sets and to use Galaxy without any fixed quotas. Ultimately, we hope this will enable you to use Galaxy to connect data sets in novel ways to make major discoveries that were not before possible. In addition to growing the research potential, we are also growing the Galaxy community through a variety of training and educational events. For example, Galaxy is now a major platform for the Virtual Applied Data Science Training Institute hosted by Howard University and the Genomic Data Science Community Network hosted by Johns Hopkins University. Through these programs, we have taught hundreds of people that are new to Genomics how to use Galaxy, for example, for DeNovo Genome Assembly and for comparative genomics and many other applications. I hope you will welcome all these new members into our community here. One of the most popular features of Galaxy is, of course, being able to access lots of important command line tools that are easy to use web-based interface without having to know how to directly interact with HP systems or clouds. But what happens when the software you are interested in using has its own graphical interface? Often these tools will also allow batch style command execution, but many times it's the graphical interface provided by the tool that is the most important part of that software. For example, JupyterLab, RStudio, NVO Interactive, Cell by Gene and so forth. The entire reason for these tools is the interactive graphical interface that they provide. This is a very important class of software that we are able to integrate with Galaxy due to the interactive tools framework. Not only can we add graphical software to Galaxy, but we can also do so using the same exact standard tool framework that provides a time-tested approach across thousands of tools that have been integrated by community members. As a heart of the availability of popular tool suites inside of Galaxy is the straightforward nature of adding additional tools. With the interactive tool framework, we are able to describe our tools using the same straightforward tool XML definition files that we as developers have all become familiar with. The only difference for developers that you need to provide the underlying software dependencies as a container such as Docker, and you also need to list the ports that you would like to make available within the Galaxy tool XML file. Users are then able to load the tool by selecting it from the standard tool menu found on the left-hand side of the Galaxy interface, and as expected the tool configuration interface appears within the center panel, and the user is then able to input data sets and configure their parameters. Once the user clicks on the tool form, a job is dispatched to a cluster. Galaxy will use the proxy to enable the user to interact with the graphical interface of the tool, and the Galaxy community has added a number of tools already including JupyterLab, RStudio, Anvio, AskOmix, iobio, Cellbygene, AetherCalc, Finch, Wallace, Wilson, PyIron, Pangeo, Panoply, Highglass, and others. Interactive tools also have full workflow support. In this example, an Anvio-based workflow has been configured with the Anvio interactive server tool being used as an intermediate step. When the user runs this workflow, the graphical Anvio interface will be launched, and the user will be able to manually manipulate their data. After they save their data sets and click to exit the graphical Anvio interface, the next tool in the workflow chain will automatically launch on the freshly saved user output. Interactive tools can exist at any point in the workflow as initial steps, as final steps, in addition to any intermediate steps. You can even configure a workflow where the output of one interactive tool feeds directly into another Galaxy interactive tool. JupyterLab is available as a Galaxy interactive tool, which includes programmatic functions that enable a user to access a Galaxy server using customizable code. But what if we take this a step further and enable graphical access to Galaxy directly within JupyterLab? This is the goal of some exciting work being performed by Jay, who has a poster available at this conference. Here you can see a standard JupyterLab interface, along with a toolbox widget available, which currently only shows a single login to Galaxy tool. When a user clicks on this tool, a Jupyter cell is populated with an interface that allows a user to log on to a chosen Galaxy server. The Jupyter toolbox is then populated with the tools that are available from that Galaxy server. A user is then able to click on the tool that they are interested in, and a graphical interface to the tool appears within a new Jupyter cell. The user can then upload new datasets or select datasets from their histories. Clicking on run will launch that job on the Galaxy server.