 Good afternoon and welcome to the latest webinar from BioExcel. Today we're going to be talking about open facts and in particular building pharmacological workflow blocks for virtual screening. In this webinar we actually have three speakers presenting today. We have Nick Lynch as our guest speaker and we also have Steve Surlindres and Adam Cospital from the BioExcel project. So I'm going to just spend a couple of minutes now telling you about BioExcel and then I'll hand over to the speakers for today's presentation. Before I get underway though I'll just want to remind you that this webinar is being recorded including the question and answer session at the end and we'll post it on YouTube and the BioExcel website afterwards. So for those of you who may not be so familiar with BioExcel I just wanted to give you a very quick overview, one slide really. So BioExcel is a new centre of excellence for computational biomolecular research and it's sort of based around three main pillars. The first one is excellence in biomolecular software and it's an important part of what the centre is intending to do to be working on important codes for the community. In this first phase of the centre we have three codes, GROMACS for MD simulations, HADDIT for docking and CPMD for QMM simulations but as the project goes on it's likely that we will also be involved with other software as well. The second pillar is excellence in usability. So as well as having good software we want to make sure that it is usable and one of the important ways that we plan to do this is to build building blocks that can be used as part of workflows and that is related to some of what we're going to hear about today in particular how open facts can be used as a data source in one of these workflows for example. And the final part of what we're doing is consultancy and training. I'm leaving that work package so the webinars are part of this activity but we've also got training courses and things going on all the time so please do check our website if you're interested in finding out more. BioXcel also has some interest groups. If you're interested in any of these topics that you can see on this slide here you can sign up at bioxcel.eu and you can see where to sign up for the interest groups. So we very much hope that you'll have some questions after our speakers have given their talks today. If you've got any questions at any time during the presentation you can type into the questions box that you'll see in the go to webinar control panel. It will look similar not identical to this but you should have a questions box and you can type in your question and at the end of the webinar I'll come back. If you have a microphone I'll invite you to ask your question directly to the speaker otherwise you can just type your question into the question box and I can relay that to the speaker on your behalf. Finally if you're watching this video on YouTube after the event and you want to post a question there you can join the discussion at ask.bioxcel.eu where you can find the discussion forums and you can get answers from the speakers at a later date as well. Okay so now I'm going to move on to the presenters for today's webinar. Nick Lynch is the current CTO of Open Facts which is a semantic link data and services platform for pre-clinical data. Nick has over 20 years experience in informatics. He was at AstraZeneca for 13 years of that leading teams in R&D informatics. He established Curly Research in 2014 working on a number of projects with pharma and biotech and life science informatics companies. He's also an investment manager for Pistoia Alliance supporting their projects and strategy and he's on our scientific advisory board as well so you can see that Nick is very well placed to be speaking about this subject today. And then later on in our presentation we'll also have two speakers from the BioXcel project. One from the University of Manchester's eScience Lab and Adam Hospital is from IRB in Barcelona where he's leading the portable environments efforts there. Okay so without further ado now I'm going to hand over today's first speaker. So Nick I'm about to invite you to take control. Okay thanks Adam. Okay so I agree. Hopefully you can see my screen Adam is that okay? That's all. Okay yes great. So thank you to the BioXcel project for inviting me to present this afternoon and I'm looking forward actually to very much see this as hopefully enabling a good discussion not just to today do it through your questions but also hopefully as part of a longer term collaboration with between Open Facts and BioXcel and the various groups that are using the individual components but hopefully those of you that are looking to solve business questions with the combination of BioXcel and Open Facts. What I would thought I would do initially is just briefly give a bit of an update on where Open Facts is with our current release and platform and a little bit of background into Open Facts itself for those of you perhaps that aren't so familiar with Open Facts as a foundation and as a project. And then as I mentioned we can talk a little bit through some of the applications of Open Facts and that will hopefully lead into the work that Steve and Adam will be covering as well in terms of bringing together the data and APIs from Open Facts as well as those sets of simulation tools that Adam mentioned as well as perhaps broadening the wider usage and collaboration going forward. So I'll just sort of start off with a little bit of background about Open Facts really. So for those of you that are involved in scientific research will understand the challenges that we face in terms of comparing data from multiple sources and we know that there are a lot of public sources of information that they each have their own history and they've each have been developed to solve different business questions. And this is always a pose the problem for researchers whether in pharma or within academic research in terms of how to in a way integrate that data and bring it into a central place such that you can then start to ask questions of data that is both relevant and interoperable to be able to get the maximum value from data that's coming from different sources and it's been a challenge that we've been facing for several years and it was the primary sort of purpose that Open Facts started off as an IMI project in 2010-2011. And the challenges that people face was that for each organization they would need to bring in each of the different data sources into their possibly either within their firewalls if they were perhaps a pharma company or set up their own environment in general. Bring together these different databases work out how they were integrated in terms of identifiers et cetera and obviously support the platform and support the APIs and all the sort of exploitation services that would need to live on them. And that was a problem that was essentially replicated amongst many different groups and in a way Open Facts was started in 2011 as I mentioned to help that problem by providing a central resource for a range of pharmacological data sources that both allowed them to be integrated and interoperable as well as providing a set of APIs on top of them. And I'll talk a little bit about how you can use Open Facts in that context now and that will lead into some of the uses that you might want to put Open Facts to when you're using no Excel tools to create this virtual screening workflow that we'll cover today and hopefully in future discussions. So as I say, it history was being able to create this single platform with multiple data sources and obviously we're at the path of any IMI project trying to make sure that we can continue its sustainability term. So I think one of the key pennants of Open Facts was the business questions and actually at the time perhaps virtual screening wasn't one of the key business questions that were which were targeted in the first few releases but it's certainly very relevant now with the fact that BioExcel is now a project in its own right and is working hard on making those tools that I have mentioned more easy to use and more easy to use as part of a broad set of informatics workflows. So I think what BioExcel is doing today is very relevant in terms of how Open Facts started with trying to target a few key questions and you can see them here and there's papers that we referenced on our website that you can look at later around but they're very perhaps common questions that researchers need to ask of the data as listed here in terms of finding inhibitors, looking at using structured profiles to find potentially relevant and similar compounds in a broader workflow. And I think this is very relevant to a virtual screening workflow where as you think about how you might look at the entry points to that problem starting with a particular protein or you might be working on a particular target or your entry point might be a set of compounds all that fits really well with the types of data that's in Open Facts and obviously then allows you to use some of the outputs from Open Facts to fire off questions to some of the BioExcel tools like Gromax and others as well as in the opposite direction of potentially using the output of those tools to help feed further analysis questions with Open Facts and I'll cover some more of that later on and I'm sure that's part of the questions that might flow today as well. So just a quick recap of the types of data that you can see in Open Facts and probably many of these data sources would be familiar to you as researchers. So we cover from the traditional sort of pharmacological sources like Kemble and Kebby all the way through pathways and patterns like Shaw-Kemble and equally data or data derived around like Disgened which is a disease gene association database from Barcelona and other sources as well as well as using a range of ontologies to help us glue the data together. So there is a rich set of data that you can access and it's something that we're hoping to grow and we definitely welcome your input to that and we offer it up as a set of APIs to get access to for your sort of workflows. So again not dwelling too much on the history of Open Facts but one of the key value points of the whole setup is the ability to use one identifier to then be able to sort of expand that in terms of its common similar entities which are equally referenced in a range of sources and that's handled within Open Facts itself and it's part of the work that the team in Open Facts does in the background and also in partnership with our data providers. So that's a key part that you can use a number of entry points to our data which will allow you then to expand out queries and get the most, get a range of data that is covered in the data sources and that could equally apply to your own private data if you so wish to incorporate that in your tools as well. So what the basic sort of technical platform and I won't, you can talk about this later is identity management is a key part of the original design of the tool. So identity management I've already covered and there's also been quite a lot of work recently in our recent work around updating our identity resolution service so Ian Dunlop has been working on that recently to move that from using ConceptWiki to using Elasticsearch and we're certainly welcome at any input that you have on that as you use some of our beta versions that are out there now. But the main access points which I'll perhaps as better explained in the next slide is the ability to use the API through a range of tools both the traditional workflow tools that perhaps be very relevant to the type of examples we're talking about today where you're not quite sure you're going to be running a series of pipelines with potentially quite large sets of potential compounds of interest to find the best matches then the API works well with workflow tools like NIME and pipeline pilot as well as I know that BioExcel have been working with workflow language guys too and we've got, there are some good presentations on that that we can send you links to later but those give really good entry points to the API as well as some of the other apps which were developed as part of the project such as Explorer and there are other ones from BioSorbit IT and Etox as well has done its own tools that sit atop our API to the potential to fire off questions from workflows that are relevant to the type of questions that you might ask of the data so that's a key aspect of the offering from Open Facts I think as I've mentioned before some of the things that we might want to cover in today's Q&A and hopefully in future discussions is this ability to marry together the data sources and API that I've mentioned before along with the sets of tools that BioExcel is currently sort of supporting and making easier to access through training and the sort of support in terms of platforms but I think as Adam mentioned earlier that the potential to grow those set of tools as well in the future based on business challenges that are brought to the project I think is a really valuable aspect of kicking off this approach now and we certainly would be welcome the ability to work on potential workflows that bring together the API in Open Facts with the simulation tools individually so that it's the input or output from one of the tools or as part of a much more complicated workflow potentially in the future where a range of the BioExcel tools are used and used in partnership not just with Open Facts but other data sources to both use existing data in a way that helps us with further virtual screening activities so just a little bit about where Open Facts is now heading in terms of our sort of release schedules as well and I know one thing that I think Steen will cover in a minute is the option this is one thing that was very important to some of the users of Open Facts was the ability to run Open Facts internally often that's the case within some companies where there is a need to use the internal data and merge that with the publicly available data and using Docker was sought to be a very good way of doing that and I know Steen will cover that and there is other information on the Open Facts website on that too in ways of being able to deploy a Docker version of Open Facts in the future too so just to sort of give an update on where Open Facts is today with our future plans and some of the partnerships that we have running as well so obviously BioXL is an exciting opportunity for us to work with these tools but there are other projects which are either we're partnering with or are using the Open Facts platform and these are just a few examples from the space projects who are obviously keen to get access to a broad range of data to support tox prediction and risk prediction as well as a more broader project which is the big data Europe which is looking at the whole setup of sort of big data projects and access to societal data both covering healthcare and other data in a single platform there's lots of ways that hopefully we can in the future bring together business questions that need access to a wide variety of data sources as well and I think one thing that's relevant to what we're trying to do in spring onwards it's been the recent work that we've been doing with our partner Data to Discovery who have sort of were formed out of Indiana University a spread in various locations and this has been important for us in terms of being able to reestablish a good technical community and development team who are really working on enhancing the stability of the platform as well as making sure that we can start to integrate other data sources as well so we see a good combination here with being able to work with health data in the future because obviously things like adverse event data is very relevant to doing research as well so this partnership with Data to Discovery is important for our current releases as well as broadening the applicability of open facts and the potential of building on the experience of Data to Discovery in the realms mentioned there as well. Just coming back to where we are with our release plans and we certainly welcome input to this in sense of if people are interested in trying out our alpha versions which we have available at the moment then that would be really good to get people's early insight to that so just to sort of give an update on where we are in July 2017 we'll be working on a data release that will include a number of data source updates including Kemble23 and we're also working on a slightly later chemistry total refresh of our chemistry standardized chemistry database and as I mentioned before there's been a lot of hard work on the text search within open facts and we've certainly got alpha versions of that that we'd love to get people's thoughts on in terms of how we might actually rank the returns from a text based search as well as we realize that that's been a very valuable entry point to the wider data sources in open facts as well and then later in the summer and we plan on doing a more regular refresh program as well as updating the remaining sources and I think as perhaps we'll be covered later the importance of the workflows as access points are very we're pleased to know that we've got some updated NIME nodes coming through once we finalize the latest API as well as pipeline pilot nodes as well that people have access to at the moment through that work so I think this is the point where I wanted to sort of leave the view of the audience with a few questions and hopefully this will spur on some of the questions that you might ask later on I think what we definitely are keen to do is broaden out the usage of open facts within the virtual screening type workflows and the way that we potentially see that is if we can come and define some typical business questions that people want to answer not just in terms of the answer but perhaps even in terms of what data sources would really help them and how we can obviously improve the throughput if throughput is an issue for this type of question so we definitely want to hear from you on that and I think some other points of interest that you might want to have is as I say future cooperation so we certainly welcome people getting involved in the project and both contributing to the code and contributing to our project meetings and so on so with that I'd like to perhaps acknowledge and thank many of all the people that have contributed to open facts now and in the past and hand back to Adam and the other speakers to carry on the webinar but certainly welcome questions at the end or after the event. Thank you very much Nick, that's a good overview and as Nick said I think we're just going to run through our three presentations today and we'll save the questions to the end but you can type them into the question box at any time to make sure you don't forget them and to make sure that we have time to get through them all so I'm going to now pass control over to Christian who's going to make the next presentation and Christian don't forget to unmute your mic as well. Hello, so I'm going to dive a bit more into the deployment and architecture behind it, I can't show much of the details but the slides will be available and you can follow all the hyperlinks in there for more information and I'm just going to walk straight in through Nick mentioned the web services and that's basically the core of how the OpenFax API is exposed to us I think about 40 different web services depending on what you want to ask for and those all on the dev side that you see there and programmatically you can get it in your usual formats, your JSON or your XML and you just pick up the particular attribute you want and then you can see how it has merged in from the different data sources so you don't need to deal with the different identifiers I want to talk a bit more about Docker because that's now how we deploy OpenFax in the live production system and that is our container technology that have become very popular now particularly in cloud deployments, OpenFax as well as now running on cloud instances the core principle is that you have an image a very tiny file system of a particular service, one application when you deploy unlike just flat virtual machines for Docker you make one service, one Docker image and those then become microservices that talk to each other so you will have the database separate from the web server which is the cloud's best friend in a way because you can maintain each of those separately so that's what we are now using in OpenFax so if you just look at the different images we use just for OpenFax you have all these yellow boxes you have the web interface, the explorer, then you have the API which is doing the queries against a triple store underneath so there are several images to start up and they will kind of talk to each other to kind of spin it up we use something called Docker compose which we help to set up which wires it all together is quite a lightweight way of setting it up and then it will start all the different images and this is crucial because you can customize each of these images to add additional data sources to expose different ports and so on so deployment wise that is quite important for exposing the whole OpenFax architecture in a flexible way particularly as Nick mentioned companies want to tweak something or have it for private purposes so when you start it up first it will just get the software you will see it lost with downloading blah blah blah and then after that it will not start immediately because OpenFax is actually quite big once you have it installed it can be like 200 gigabytes while downloading it's only 20 gigabytes and it compresses quite well as you might know but to get it started it needs to do that kind of staging phase of getting the data so the Docker compose will take care of all of that and then finally you will start up the services if you look at the logs you don't need to do that but then you will see all the different colors of what's flying around so if you're changing the queries and so on that's what you would have to look at but that sounds a bit detailed so most people wouldn't want to see that that's only if you want to have your own install that you need to deal with this we are looking at in the BioExcel project to go kind of back again to making a big fat virtual machine image if you like and that's part of our BioExcel portal that we are developing with the EBI and Elixir is to simplify deployment of these kind of images onto different grid and cloud providers working with the EGI and Amazon and so on and here you see especially browsing the BioExcel collection of tools and this is a bit fiction because we haven't got a blue deploy button for openfax yet but that's what we're working on right now we have it already for the comps workflow system but that you can deploy them longer and then all these composed things that you have shown before will have already been done for you so you don't need to wait for the data down there because on the grid infrastructure you are ok to have these big data images now how do we use openfax because you don't want to use those JSON APIs directly on the command line most people use workflows like Nick said so there are two big workflow systems in cheminformatics pipeline pilot this is a very popular one particularly with industry users and you can install the openfax component using that there is a separate webinar from openfax which Nick which you can find on the openfax website which explains this in more detail there is also the NIME node which is under development now there was a prototype and now there is a new stabilized version which allows you to use any of the API calls within the NIME workflow system which is an open source system and in here you can customize it to use different installations so by default it will use the public API but if you have set up your own instance you can change it in preferences to use the deployed openfax and then you can combine it with all the other powerful infomatics tools that you can find in NIME for comparing structures and so on now what we have been looking at in BioExcel is kind of many people start wondering which workflow system should I use and this has been a common problem particularly in the bioinformatics field there is even more and more work for systems coming up particularly as people are doing next-gen sequencing and so on and one thing that has sprung out from that community is something called common workflow language now it's not bioinformatics specific it is a pipeline language but it's generic and implemented by many different workflow engines so you're not tied into one particular system anymore but you can express it on a tool and step level to combine a pipeline of tools that you want to execute and the inspiration behind it is coming from Docker and command line tools what we've been looking at in BioExcel is to extend this also to do web service calls to openfax and then particularly we need to have it parameterized because it could be different instances now one thing we find with CWL was that while it is a standard format that you can write by hand and you can save it from the other workflow systems it didn't have a good web presence so we made CWL viewer which allows you to give in a URL of workflow you can go and inspect it and have a look what the different steps are and share it with others and there you can see the annotations if you have bothered to put them in you can describe the individual steps so that's a good way to convey what you're doing in the workflow to others particularly for academic users who want to publish it's a good to have a linkable web page that you can describe but independent of how you run it now you can read more about what we're doing in BioExcel on our website where there's the workflows page and here you can read about the different systems and in particular you can read about the workflow blocks which is a way to generate little components of workflow fragments that you can then combine to custom workflows and Adam Hospital will talk more about that in just a minute and you can use the askBioExcel.eu website where we have added I've added a new question in there where you can ask questions about this and anything else about workflows open facts have their own support system but I've added links to those as well so you can see it there and of course you can ask any questions during the webinar using the webinar system so that's it for me and I think Adam will pass over to Adam thank you very much Steen yes so now I'm just going to take back control and advance the slides for Adam must be down can you see the slides now you should be able to you should be able to your talk hope you can hear me yes okay so my name is Adam Hospital and I'm going to present you the last part of this webinar where with just five slides I will try to show you the interaction between this BioExcel center of Dexcel and the open facts platform if you can pass the slide please yes fantastic the collaboration started with the definition of one of our pilot use cases in the project in BioExcel we are working on five different pilot use cases that you can discover if you go to our webpage you have the link in this slide these pilot use cases are real scientific problems mainly involving high throughput analysis or high performance computing simulations calculations in different fields such as genomics molecular recognition free energy simulations and also molecular dynamics and even QMM simulations it is in one of these pilot use cases that we want to design and develop a complete virtual screening pipeline like the one that you can see represented in this slide and here is where open facts can help us a lot and we'll see that in a minute but first of all for the ones that are not familiar with virtual screening technique virtual screening is a computational technique that is being widely used nowadays especially in pharma companies for the drug discovery process but the goal mainly is to filter compound libraries libraries that can contain up to millions of structures to just a few hundreds using a set of qualities or looking for a set of qualities or particularities such as geometric or electronic complementarity dry lightness, size, toxicity of the ligand of the compound etc then reducing the number of compounds basically from millions to just hundreds that gives the possibility to start in vitro testing or optimization processes that wouldn't have been possible with millions of compounds as you may know and of course that is saving lots of money also to the company you can see this process represented in the figure on the left part of the slide with lots of compounds together with protein receptors in the top of the funnel and just a few of possible protein ligand complexes as a result of the pipeline at the bottom of the funnel this pipeline that the vitro skinning pipeline can be divided in three main parts as you can see on the right hand of the slide the retrieval of molecules the structure modeling and the recognition process ok the first part of the pipeline consists of the retrieval of a library of compounds also a set of decoys related to these compounds that are inactive compounds to be used mainly for validation and receptors, a set of receptors that usually are proteins with a certain ability to dock the ligands so it's precisely in this point where we're going to use the power offered by by openfax for a number of different reasons because of the number of different databases included in openfax because of the possibility to link these different databases using the API because of the power that this API has etc. for all of these reasons and for what you've seen during the presentation the NICS presentation and also STN we chose openfax platform for this particular part of the pipeline and as the title of the webinar says we don't want to use openfax to build just to use openfax database from the API let's say but we want to build pharmacological workflow blocks for virtual screening using this API and I will show you what that means in a couple of slides so now to finish with the virtual screening pipeline, no no ok thank you Adam the second part of the pipeline that you have in the slide is the most computationally expensive part where we want to model the structures either for the ligand as also for the receptor that are usually proteins and we want to complicate things a little bit more in bioxcel because we want to take advantage of our ex-scale expertise and we want to take into account receptor flexibility and for that we want to run molecular dynamic simulations of these receptors to basically obtain a set of structures and symbols for each of the receptors, a set of conformations for each of the receptors and finally the last step of the pipeline with a set of compounds and receptors and also different conformations representing the flexibility we are going to run bio-molecular docking programs and then obtaining the protein ligand complexes where the pipeline is select the best ones with the final scoring and analysis process so if you are interested in having more information about the particular virtual screening use case you can click at the link at the bottom of the slide and you'll find the description with further detail so next slide please ok so in the previous one I said we wanted to build pharmacological workflow blocks for the virtual screening of open facts, by why do we want to build these workflow blocks? I mean why don't we use just the API directly and we think this is important because we think scientific workflows are extremely important and this is actually one of the goals of the Centre of Excellence and you have this written in this slide one of the goals of the BioExcel Centre of Excellence is to increase the usability of European infrastructures by providing easy access to computing and data resources and we are doing that by designing, deploying and making available bio-molecular workflows so I'm sure you know I'm sure you have used scientific workflows before but are systems widely used in scientific groups and mainly are used as a way to organize and execute pipelines that are usually a list of common line tools but they have a lot of advantages comparing them to a list of common line tools as you have seen in this presentation they can be described and shown graphically with different viewers and you can see an example in this slide and they can be stored and shared for example you have examples in my experiment webpage they increase reproducibility and also provide provenance and they can be used to automatically dependences in parallel applications they have a lot of advantages but again if you want more information about BioExcel and the work that we are doing in workflows as you have seen in this presentation we have all the information in the BioExcel webpage I've put an example in this slide a specific workflow just to show you how difficult the process can be sometimes even processes that appear to be easy like this one all of the workflow steps of this one are needed to just prepare a protein structure that you can find in the protein database in the PDB to be used as an input in a molecular dynamic simulation just to prepare obtaining a system with a box of water molecules counter ions surrounding the protein we need all of these steps to just prepare the protein so what we are doing in BioExcel is to transform all of these boxes that you can see here into workflow blocks workflow blocks can be easily used first used and then interconnected with other building blocks such as the ones that we are developing with the OpenFax library and how are we doing that this next slide in BioExcel we decided to implement these workflow blocks as a set of python wrappers just that, a set of python wrappers so in the particular example of the molecular dynamic setup we started from the previous diagram and we converted every process or task to a building block these building blocks all have the same structure, inputs, outputs and the same syntax for the parameters so that they can be easily interconnected and that makes them interoperable as I said before this methodology also makes them workflow manager agnostic that means that they can be used with different workflow managers think about them as just a wrapper to a particular piece of code and of course as you can see in the right hand part a whole workflow as complicated as the one that I've shown you in the previous slide, the molecular dynamic setup can also be wrapped in just a single building block this is the one that is called prepare structure the blue box just after the recover protein structure and then we can use this building block that is a complex workflow as it was just a single tool using this methodology so as I said before these workflows can be run using different workflow managers this is the last part of this slide for example here you have Pycoms, Galaxy and Toil and depending on the context you might want to use one or another so for example if you are interested in running your workflow in HPC centers if you are interested in using hundreds of processors in parallel, you have Pycoms or Toil that maybe are more suited for that but if you just want to run your workflow test your workflow in your own workstation analyzing intermediate results or changing input parameters then maybe you can use Galaxy or Taverna our building blocks can be used in all of these workflow managers so once in the final part, once the workflow is designed it can also be described using a common workflow language and that has been presented before in this webinar. In BioXL we are describing all our workflows in CWL and again you can find more information about all of that in a document that I put a link on this slide at the bottom of the slide so next slide once we have our workflows implemented using this library, this Python wrapper library I said that we could run them using different workflow managers but we also can run them on a large variety of computational infrastructure as you can see in this slide so in this case again using the MD Demographics setup workflow that I showed you before we tested it in personal computers for example in your own workstation we tested it in virtual machines running in private clouds so for example like OpenEvil or OpenStack and also we tested this in virtual machines running in public cloud infrastructures like EGI or Embassy Cloud and of course we also tried that in HPC supercomputers like the MarinoStrum supercomputer that is placed in a Barcelona supercomputing center and all of this using the very same code, the code that you can download and install from our BioXL Github repository that you see in this slide so now that I have presented the connection between BioXL and the workflows I can go back to the virtual screening pipeline that is the subject of this webinar and if you click Adam yes our idea is to take advantage of the right hand part of the slide so HPC centers because we want to run that in a massively parallel environment as can be a supercomputer so as you can imagine we want to perform an extensive virtual screening of a huge compound library remember millions of compounds and if you want to include docking in different conformational ensembles of the receptor and we also can go even further and mutate some sequence variants on the proteins and use them also as a receptor all of this is a challenging process and of course also it depends on the characteristics of the system, on the size of the system mainly but if we can run this workflow efficiently in a massively parallel environment such as supercomputer that can give us the opportunity to scan millions of compounds and that is essentially what we want to achieve in this use case in BioXL so last slide is finally the pipeline this pipeline that you are seeing here is the first prototype of our workflow and as you can see we will use openfax of course to retrieve the library of compounds and also receptors using our workflow blocks that we are developing and we are using the directory of useful database to retrieve the decoys for the protein receptors for the protein receptors as I said we want to obtain ensembles of different conformations and for that we will use a grommat monocardynamics package which by the way is one of the main codes in BioXL project as Adam introduced you before and finally we will put all together receptor conformations, compounds, decoys all together and run a massive docking procedure using for example one of the programs that is the HADL program for protein ligand docking which is also one of the main codes in the BioXL project as a first step to our goal we want to implement this workflow and also we want to validate it of course and to validate it we will work with a real scientific example we have chosen an example that is widely studied by the pharmaceutical industry with lots and lots of information available and that is EGFR to ensure you know that there is a hormonal growth factor receptor that is a trans membrane receptor located on the cell membrane and is associated to various types of cancer and just for that is representing an important drug target again if you want to know more about this example the EGFR validation you can find more information in the link that you have in this slide in the pilot use case sections of the BioXL webpage of course we are not going to do that alone we have a company that is called Nostrum BioDiscovery that is placed in Barcelona Supercomputing Center that will supervise and help us during all the validation process and also will help us in the updates and evolution of the whole pipeline and of course all the news, results and information coming from this use case will be published regularly in our BioXL webpage and with that I'll hand over the mic to our host Adam thank you very much Adam okay so three different perspectives there of open facts and how it's being used and how it can fit into the other work of BioXL so thank you very much to all of our speakers today it's now open to you guys listening if you have any questions now is the time to ask them the next question before the best thing to do is to use the question tab in GoToWebinar to post your questions so while you're thinking of some type in there there was one immediate one that I had it's probably a question for Nick I don't know if you are able to answer but the question is really just for somebody who's less familiar with open facts do you recommend as the best way to get to know it and to try it out and to sort of explore what it has to offer I mean I think a couple of ways that will give people a good entry point to open facts is there is a tool called Explorer where the link of which is available from our website which is a graphical sort of client which uses the API and allows some fairly typical science questions to be asked around compound and by text etc and it sort of starts to show you the range of data that's in open facts and in a way perhaps given what Adam was describing I think part of this is for your community is sort of understanding what API calls you might want so Explorer will only go so far I think the key really is perhaps looking at some of the presentations on open facts and particularly I know Daniela and others have done a lot of work on trying to work on some of the workflow tools and there are some good examples and there's some stuff in our support pages which again is available from our main URL but we've got a support section that will give people some good entry points hopefully and we certainly welcome comments That's great Nick, thank you. So we've got our first question from the audience from Michael Michael if you have a microphone I'm going to unmute you for a moment and you can ask your question directly, if you don't if I don't hear from you then we can get your question when you go Michael, thanks. Thank you for this nice presentation so it's really an interesting new platform I was wondering if a virtual based setup is not bad performance wise I mean it has a lot of advantages of course for developers but shouldn't this affect performance? I guess I can answer that yes it could affect performance but that's also actually one reason why we have been looking at Docker because when you use Docker it's a much more lightweight binding it's not virtualization as such it's more like you get the clone of the kernel space in a sense so you live outside the rest of the machine and that means you can run the openfax platform natively on the platform for much better performance directly and so on so actually then you have a choice if you run Docker on Windows or Mac because the Docker machine expects to be inside Linux it will be within a virtual machine but if you have a Linux host it will run natively but within its own little container space and therefore you get the much better performance thank you, just a question if I want to install a virtual machine I have to have root permissions or elevated permissions is this also the case for this Docker application? it is the case also for Docker there is other technologies that do not require root permissions but we haven't looked at those yet for openfax but that is particularly the case also on the HPC environments because you can't use root from an application point of view obviously now we are not planning to deploy the openfax platform on the HPC architecture but that is certainly something that comes up a lot also in the common workflow language because you shouldn't need to have root access just to run an application thank you, thank you very much Michael for that I have another question from Carol I think it is a fairly straightforward one so I can probably just recount it in this case Carol says they have tried to get to HTTP Explorer 2.Openfax.org to look at the Explorer but it is getting a message authentication parameters missing is that a temporary problem? do you need to be authenticated somewhere first or is that meant to be public anyone can look at? we are just in the process of updating our platform and I think that is a consequence of the update process we are running so apologies that it might not be available on that URL at the moment but if you send me your details I will let you know when it is back online thanks Nick for that the question from Michael was just asking about a link to the slides they will be posted after the webinar so Stean's one is already available I think you posted the link in the chat already you may be able to see it Stean maybe you could post that again in terms of the rest of the presentations I haven't posted them yet but they will be made available online at slideshare you can find the webinar page again on our website in a couple of weeks time but after that they should be on the same page for you to find this webinar and then Michael any other questions from the floor before we wrap up I had one other question myself the way Open Facts provides access to so many of these different data sources having the single point of access to them all is very powerful but I was wondering if that power extends to actually being able to construct queries that ask questions across multiple data sources is that the kind of thing that Open Facts can deal with or? Yes depending on the complexity of your business question I think this is where perhaps a future discussion could be valuable so the current API calls reflect the sort of typical questions that we felt people wanted to answer at the beginning and during the project so there's about 60 API calls which commonly do cover more than one data source very often so you might get data from Kemble, you might get pattern data from Shore Kemble and you might also get pathways data from pre-canned queries do already cut across the multiple graphs that are loaded and then the option that we're trying to introduce in future versions is both a sparkle endpoint which would allow you to craft your own queries and obviously jump across federator query, sorry use a query that does query multiple graphs as well as the option to create a pre-canned call for very popular ones as we have done in the past so yes you can do it now and certainly we want to do more of it in the future. Thank you very much Nick that's great that makes it nice and clear so just to follow up the previous question there is a response in the chat at the moment that Steen's given in response to the performance question yeah Steen do you want to comment? Yeah I just had to google the right links so Singularity and Shift are the two container technologies that are popular not requiring root access but that means you have to choose but it's also an initiative called Open Container Initiative which is trying to standardise the definitions as on between those and so you don't have to define the containers again in each of those CWL is also working on using Open Containers so that you can mix and match these things Great thank you Steen and I think we've got time for one last question there's another question from Carol since I can more scientific questions so I'm going to open the mic if you want to pose the question yourself and if I don't hear from you I'll recount the question I'll ask it directly so I'm not hearing at the moment so I'll just recount the question it was a question to Adam and it was how do you deal with preparation of structures of ligand within workflow do you use ex-tile structures or rather generation of 3D from smiles or in kiss is that in kiss Adam would you like to comment? We are not dealing with preparation of ligand structures in the workflow that you already have in the GitHub of bioexcel in this microdynamic setup but what we are doing is to move all the work that we did a long time ago in a server that is called MDWeb here in Barcelona you have also a webinar I can publicise a little bit that for MDWeb also you have the link in the chat now if you want to see what the web server offers but the thing is that in the web server you could directly upload your ligand parameters and work with these parameters now we are working on a way to automatically extract these parameters also using AC pipe and we can also do that from smiles or in kisses if you want so we have thought about that and we are working on that and we will have that sooner hopefully. Thank you very much Adam we are pretty much out of time for today thank you very much for coming along and for your questions this afternoon as I mentioned before if you do have any further questions you can find us at bioexcel.eu and in the meantime thank you for coming along today we do have another webinar on Thursday if you are interested in talking more about what bioexcel is doing keeping up with the webinars you can find us at bioexcel.eu okay thank you all very much for coming along today thank you to our speakers and we will speak to you all again soon thank you