 Hi, my name is Nula O'Leary and I'm product owner for NCBI Datasets and in this video, my colleague Adam Stein, the curator at SRA, and I are going to introduce you to the latest updates for accessing NCBI data within Galaxy. So, for my part, I will explain what is the Datasets project, it's a new project and may not be known to a lot of people, and talk about how Datasets is partnering with Galaxy to get access to NCBI data. And then I'll pass it to Adam who will talk about the latest on SRA and Galaxy. So, what is it exactly is Datasets. Datasets is not actually a database, it's a resource for creating user-friendly web and programmatic access to the sequence data across NCBI. We're a new project and we're just starting out and our current scope right now includes genomes, genes, so these are mostly the reference genomes and the reference gene annotations, and a SARS-CoV-2 dataset and you can access these through our website and I have the URL for our landing page here. So, how exactly are we making this easier for users? Well, we've created a genome service where you can just come to the NCBI homepage or to the Datasets landing page and you can search by the organism that you're interested in. In this case, I'm showing a rapid opus. You search for that organism, a box will come up and it will show you how you can link over to browse for all the genomes that are available for that genus. This will take you to a table and then within that table you can select the genome assembly that you want. You can also filter it down by several properties. Once you've selected the dataset that you want, you can select the download button and it will give you choices. So, a genome dataset is a complex set. It has sequence, it has annotation and files in different formats. So, you can select the file format that you need and download the data. So, how are we partnering with Galaxy? And this is a new partnership. We are not out yet, it's coming soon. So, what we are developing is within that genome table, you will have the option to select send the Galaxy button. So, you can pick the annotation or sequence file that you need and send those to Galaxy. You can also select these assemblies from within Galaxy. Using the through and get data, there will be options for NCBI datasets and within there you'll be able to search for the assemblies that you need. And how does this connect to SRA? Well, now you'll have a way to get the latest reference genomes directly from NCBI. So, it will be the newest files and the latest annotations. And if you need to do an RNA-seq alignment to the reference genome, you'll be able to get that newest reference genome directly within Galaxy or through NCBI and send to Galaxy. So, now I'll pass over to Adam. Thanks, Nolil. For any of you that are unfamiliar with the sequence read archive, we have 23 petabases of open access data and that number is constantly growing. All of it is available for download either directly from NCBI or within commercial cloud providers of Amazon and Google. Galaxy has been using the Amazon cloud resource to provide data to users since about February of this year and it has shown improvements in speed and reliability. There are a couple of different ways that you can actually get the data from SRA into Galaxy for work. We have a button within the computing column on the upper right hand side of the SRA run selector that will allow you to send your runs into Galaxy. Or alternatively, if you are in the Galaxy tool itself, within the tools section, you can use get data and either download in specific formats or search for data to get these accessions. There is a step-by-step tutorial available on YouTube. If you search for SRA Galaxy, you should get it. It's run by the Galaxy team and it was very informative. If you signed up for the GCC training track this year, you can see a tutorial on using the new SRA Align Read Format or SARF. That does some pretty exciting new things and the file sizes end up being smaller than just the raw data. As well, with a resource this big, one of the most important things is how do you find the things that you're looking for? And one of the ways that we have tried to solve this problem is providing metadata dumps in Amazon Athena and Google BigQuery. It allows users to do SQL-like queries, which can provide some very powerful tools for finding the exact data you're looking for. If you're interested in this tutorial, it's within the science track. It is the advanced NGS analysis and it is done by John Tro from SRA. Thank you very much.