 Hello, my name is Charles Hadley King. I'm a senior research associate at George Washington University and technical lead for the biocompute project. Today I'm going to be discussing publication of biocompute objects created from galaxy workflow invocations. I'm going to spend the first few minutes of this talk, going through a few slides about the development of biocompute objects and the needs that biocompute objects fulfill. And then I'm going to do a demo of some recent integrations we've submitted to the public galaxy repository. Biocompute is a standardized template for reporting how a computational experiment was done, and also includes why it was done. The original need for biocompute arose from United States regulatory agencies specifically food and drug administration, but its application is far greater than just regulatory science. The standard reporting template supports fair data standards and encourages good data prominence. So it's something that we see is applicable to many other areas of science communication. In 2014, the genomics working group at the US FDA can be into special session to discuss workflow communications, particularly as it pertains to NGS or sequencing data. They came away with four main aspects a solution should adhere to. The first is that it'd be human readable, like a gen bank sequence record. The second is that it also be machine read, which means it has structured information, predefined fields and associated meanings of value. Everyone is explicitly clear on what every single field means. It should contain enough information to understand the confrontational pipeline to interpret information contained therein should be complete enough to serve as a record. And it should allow reproducible experiments. The final takeaway was that it should be a meeting that we should have a way to ensure the information has not been altered since it was first created. Out of that meeting bio computer was born. Currently, it's an IEEE approved standard for communicating bioinformatic analysis workflows. It does this by acting like an envelope for the entire pipeline. It's essentially an aggregate of all of the information that one would need to understand the pipeline. And in doing so, it also incorporates other standards. It is human and machine readable. It is categorized by domains, which I'll get into in a little bit. It adheres to and encourages fair principles. It's also fully open source. It's adaptable. Preserves data provenance and has a new ID perversion. Bio compute streamlines the reporting without enforcing any specific tool platform overflow strategy. And machine readability enables customized views and development of other tools associated with the report. So here is a bio compute object color coded to highlight each of the different domains. There are eight different domains by which are required the provenance list contribution name identifier state of creation description domain contains information to describe the pipeline. Keywords external references. Steps specific tools use our domain is where you can record sources of our parametric domain specifically lists each parameter that's been used in the workflow. Execution domain contains information that could be identifiers day of creation description domain contains information to describe the pipeline. Keywords external references. pipeline steps specific tools use our domain is where you can record sources of our parametric domain specifically lists each parameter that's been used in the workflow. Execution domain contains information that could be used for reproducibility, i.e. where a script is located external data endpoints so other resources that you need access to to execute the workflow. The IO domain input output contains exactly that your inputs and your outputs. The user domain is for user defined fields that are either required or optional. This is somewhere where you could insert your own defined fields to include additional information that might not that might not be present in a base bio compute. Execution domain, which we see as one of the most important aspects of the bio compute that's where the scientific purpose of analysis or computation is recorded. Free text field so a creator can put in a description of what they're hoping to accomplish the analysis, they can link out to other papers or previous studies or analyses. To review the key features of a bio compute object. It abstracts away workflow based on commonalities. So it's platform tool protocol independent usability domain being free text description and scientific purpose data provenance. You have data manifest track processes from beginning to end. It's a track user attribution. So, authored by contributed by viewed by. So you know which person contributed what type of content to the object or the analysis. There is what we call a verification kit. So, with a full bio compute object, including an error domain and the IO domain. The input and output files. This can give you a sanity check, given a specific set of input files and the inherent or known error is the output of this analysis claims to have gotten valid. Bio computers are extensible. To the extension domain and fully open source. Bio compute was part of the open source pilot program for IEEE. So it's one of the first standards that they published that was fully open source. Over the development process. Many different groups have helped with the development of bio compute both conceptually and technically. And a few of them are listed here. Probably about 400 or so different individuals have participated in one way or another. And helping to develop the technical specification. So let's move on to the demo here. I have a instance of Galaxy local instance running right now. It's based off of the code that we have on our bio compute work of Galaxy at bio compute objects slash Galaxy. Currently, we have a poll onto the main development branch. Possibly to be released in September. We have a release of Galaxy. But for the time being, you can either take it from our GitHub. Or you can run it at our Galaxy portal link here. So hopefully everyone here is familiar with workflows and how they work in the Galaxy ecosystem. We chose workflow invocations as the entry point for bio compute objects because most of the information we needed to create a valid bio compute object was already available in that portion of the Galaxy code base. So I'm just going to show you to, you know, construct a workflow. I have a pre constructed workflow that I've used. We're going to run this workflow and select run workflow. And this is going to work here on my local machine. So we'll take you over to this. So we're done. And you can see here. In the particular run ID. We have options and option for bio compute objects. It's also available under workflow invocations. And you can see it from this same workflow a couple of times to test it out. So we'll go with the one that I just ran. Previously, I think the last year's GCC, we had integrated some the ability to download a bio compute object. So, you know, that's still there. That parses the workflow invocation and pulls out a good deal of information, but there's still a few pieces that you know a user needs to input and they can't be pulled out automatically. So to address that we've added a export field to this menu. You simply fill out the URL or BCO database API and give a specific user API key select the table and the groups that you want to apply it to and hit submit. Now I'm going to come back to this in just a second. In our bio compute portal. We aggregated all of the resources we have for bio compute objects links to the hive instance on AWS galaxy instance a page for different community organizations associated with bio compute check those out. We also have a page where we list all of the different resources and applications I'm just going to sign in with my account and can see here a few objects listed. This is our beta version. So these are just test objects here to submit an object from galaxy. You need to acquire your API token. So once you have an account and once you've logged in you'll see this bio compute object server section at the bottom and you just copy the token here and put it into the form just to show we have a few draft objects here. So I'll go back to galaxy hit submit and then refresh this here and you can see there's a new use or new object here galaxy bio compute object development test. So now I'm able to open the form here and you can see the information that came from galaxy has been populated into this bio compute object now to and user can fill out the form here if it's filled out and valid you can select the server to publish it to assign permissions to the associated groups and publish draft. Now we should see our galaxy bio compute object here in the bio compute viewer and this is publicly accessible for anyone to see as I'm showing you with the chrome incognito tab right here anyone can view this object. So this work was done by myself and Christopher Armstrong the development lead for bio compute project contributed significantly to this I'd also like to thank Dr. Jonathan Keeney assistant research professor and the bio compute lead Dr. Rajan Mazzunder my PI at George Washington University and then Janisha Patel bio compute