 Hello, everyone. My name is Brian. I work at the UC Santa Cruz Genomics Institute. I'm going to talk today about working reproducibly in the cloud. I'm going to talk about my favorite project, which is DocStore, which is a project that was started at OICR a couple years ago, and now a collaboration between Santa Cruz and OICR and others as well. Okay, so the learning objectives for the module. By the end of the module, I'm hoping everyone feels comfortable packaging up their tools into Docker, which is a really powerful tech, but I'm sure a lot of you have heard about before, have seen kind of glimpses of. We're also going to talk about how do you describe your tools and your workflows in CWL and give you a glimpse on something called Whittle as well. These are alternative workflow formats, so I'm going to show you how you link the Docker tools to these standardized workflow formats. Then I'm going to talk a little bit about how you actually then share your tools in Docker, share your workflows in CWL or Whittle. I'm hoping everyone will feel comfortable packaging a tool in a Docker image, being able to write a workflow. My tutorial today, it's going to focus on CWL. A lot of people like Whittle, so that's an alternative out there. I'll give you a little glimpse of that. Hopefully everyone can share their tools and workflows on Dock Store, and most importantly, the reason we're going through all this effort is you're doing this because you want to be able to pick your thing up and run it in a different platform, so start in Collaboratory, move over to Amazon, not rewriting everything. That's the key goal here, is build it once and have it be portable across many, many different places. Again, how many people, since I'm kind of new to the class, I just parachuted in for this talk, how many folks here have used CWL before? Okay, that's good, because if everyone raised their hands, I'd be like, okay, done. Time for coffee. How about Docker? Who's used Docker? Okay, couple folks. Who's heard of Docker? Okay, excellent. Okay, great. So, okay, sure, common workflow language, workflow descriptor language. So a bit of background there. CWL is this open source community oriented workflow language. It was created a couple years ago at a hackathon around BOSC, Bioinformatics open source conference or convention, I can't remember which. And so it was basically people got together and said, you know, it'd be really great if I didn't have to rewrite my tool every time I moved from system to system. So Whittle is the Broads version of that, where they looked at CWL and said, I like what they're doing here. But I also want to make it easier for biologists to write this. So let's simplify the syntax, let's streamline it. Both of them are excellent choices. And what we're seeing in the community is it's kind of like a beta VHS. Maybe that's too old of a reference actually. It's, you know, HD, DVD, Blu-ray, is that more time? Anyway, it's one of those things where probably one will win in the end. But what's happening now that's actually quite nice is most of the workflow environments out there are actually learning to run both. So in the future, it may not matter which language you really write in, most places will be able to run them. And in fact, I've talked to the CWL and Little folks and they actually intend to have mixed workflows in the future. So you might take tools that are described in Whittle and use them through CWL. So that's the sort of future we're headed towards. You'll see, you'll see coming up. Yeah. But the idea is that you can string these things together in a workflow. So parts of that workflow might be written in one language, parts in another. That's the ideal world because what you ultimately want to do here is not reinvent the wheel. You want to Google search, find a bunch of things already prebuilt and amass them together into a workflow that you can then run. That's the ideal world. So Christina did an awesome job giving you background on the PCOG project. I'm going to give you a little background on why I got involved in building infrastructure like this from the context of PCOG, kind of give you like the techie perspective of what came out of that project. That was effectively Dockstore. So I'm going to tell you about that platform, why it's something that you all should consider submitting your tools and workflows to. And then we're going to go through a practical. And I don't know how far we'll get through the practical if we actually get through the whole thing. Because it turns out I have more time than I thought. What we're going to do is we'll break into groups and basically each group will come up with an idea of like, oh, I really want this tool to be dockerized. So let's make it happen today. So we'll see how far we get with this. And if we fully get through the tutorial I have for you guys the interactive tutorial, then we'll do that. We'll break into groups and have some fun. Okay, so you heard a lot about this PCOG project. My team and I, when I was working on PCOG, I was primarily at OICR and later shifted to Santa Cruz. But my team and I at OICR, we focused on trying to build cloud provisioning infrastructure. So Christina gave a really nice context of the data that was involved and the nature of this project being spread around the world. My team and I were like, right, okay, we have 14 cloud and HPC environments and we have a common set of pipelines that we need to run on these environments. What do we actually do to actually set up these environments and utilize them to their full potential? So that was the core challenge of PCOG, is trying to figure out how to work across all of these environments. And this was at a time when, you know, I think this is like three years ago, like, yeah, it's time flies. Three years ago, Docker wasn't 1.0. Like a lot of these technologies just didn't actually exist. CWL, I don't, I think this project pre-existed CWL, so we didn't have a common way of describing workflows. So a lot of the technologies we used, we wouldn't have used today because there are better technologies out there. But this project was really helpful for me to try and understand what is that sort of iterative process that we need to get to in order to have infrastructure that can span multiple clouds. And I'm sure most of you guys are looking at this and saying, well, you know, I'm never going to have 14 cloud environments to work in. That's just, that's crazy. But the reality is like, you may not have 14, but the reality is data is actually spread all around the world and you as scientists are going to want to have access to that data all around the world and you know, at least potentially bridging two or three environments. So having the same sort of technology stacks that kind of came out of Peacog, the better versions via Dockstore, CWL, Whittle, having those technology stacks in your belt so you can rely on them are going to sort of open up to you a world where you can analyze data that's spread across multiple locations. So the challenge, one of the first challenges was how do we work together to build these workflows and encapsulate them? If only there was this technology where we could encapsulate tools and then send them around to various places. So when we started this project, like I said, Docker was really this beta thing that no one was touching. So we started this project by literally sending around install scripts to various places and saying, okay, you install this thing and you install that thing and you install that thing. Now multiply that by 14 places and multiply it by, you know, a thousand VMs running and our heads exploded, right? Like just trying to keep all of these machines up and running and go through, you know, a two-hour process of setting up each of these VMs. It was incredibly labor-intensive. So we really were looking for ways to package up our workflows and have that be a very robust, reliable thing. So Christina mentioned many, many different lessons learned in the PCOG project. I have a similar sort of take on those points as well. So see the previous module for that really nice description of a lot of the sort of people issues of organizational issues of how do you how do you include humans in the process? How do you remove those humans in the process to scale out? What I'm going to focus on is a few key techie lessons learned from the the PCOG project and ultimately dive into number four here which is portable tools. So solving that problem of how do we send our tools around, how do we send the algorithms to where the data lives in a way that doesn't make you go insane because it's so tough to do. But anyway, coming back to sort of technical lessons learned from PCOG, one of the key bottlenecks actually in this analysis, we weren't CPU bound really in the project which is kind of surprising. We got a lot of donation of cores in a lot of different places. If you actually look at the utilization, we didn't actually really peak the utilization and keep it pegged. What turned out to be one of the larger issues for us was the scalability of the storage infrastructure. So this genose system had a limited number of computers to support it in each of our cloud environments. So that turned out to be from the technical standpoint a major scalability red flag for us. The other thing that was quite time-consuming again, not only describing these workflows and sending the description of the workflows around, but the execution, these workflows could take two or three days to run. What happens if that machine disappears? One of the private, the academic clouds will go nameless but I remember having conversations where I was like, yesterday we had 100 nodes, today we have 50 and they just disappeared overnight. What happened? And they're like, no, no, everything's fine. I'm like, no, the VMs are disappearing, something is not fine. So you have to program. Obviously the collaboratory will not have 50 percent of its nodes disappear overnight. I guarantee George will not allow that to happen. But you do have to work in a sort of world where you deal with failure on commercial clouds or academic clouds. So how do you build out a infrastructure that's resilient to that that doesn't require you logging in at three in the morning to kickstart things again? That's really important. And then finally, we've already talked a lot about the portability of tools. I'll go into the specifics of how we adjusted our process over the course of this project. So the scalable storage system, one of the key things from the PCOG project and very late in the PCOG projects game, I think only the last OxoG pipeline I think leveraged this ICGC storage system, which is now used on Amazon and the collaboratory. But the idea behind this is you don't want to have a storage system that has one or two or three computers sitting there supplying the data out to 500 DMs because that's a bottleneck. So instead, the architecture that you're looking at for private clouds or commercial clouds really should be based on an object store. This is something that typically in Amazon, there are thousands and thousands of object store nodes available to you. So you basically can't saturate it. You can pull as much data as you can and it's consistent. And so that's the sort of infrastructure that we're looking for. And this is something that a lot of people forget about. A lot of people think about, oh, I'm going to scale out my compute. I'm going to scale out my CPUs. But if you're trying to pull from a very slow storage system, no matter how many more CPUs you throw at it, this is actually going to make the problem worse. So this was a key piece of technology that Vincent Freddy's group at OICR built. And my team helped spec out the interface to this. And it turned out to be quite helpful. You can see this is just a, I actually use it myself at Santa Cruz. To try and get an understanding of how it scales, we looked at basically spinning up 10 nodes at a time, continuing to add more and more nodes on Amazon to try and pull data from the ICGC storage instance running on Amazon. And what we found is in projects like PCOG, when we had a sort of limited number of download hosts, you would hit a wall at a certain number of nodes. You'd hit that wall and it would just, the performance would just fall off. Here on the ICGC storage, again, since it's backed by Amazon S3, I could just basically throw more and more compute nodes at it. And I just got pretty much a steady 45 to 100 megabytes a second. Now that's not necessarily blazing. You'd look at that and be like, oh yeah, that's going to take like an hour to pull a BAM file or 20 minutes to pull a BAM file. That's not super fast. But the important thing here is when you're thinking about working in a cloud environment, you want consistency and scale out ability. So if you can have 10,000 nodes running and they all get 100 megabytes per second, that's way better than getting a gigabit per second running on 10 nodes. The other thing is we saw this diagram from Christina's talk. This was a major thing that we realized throughout the PCOG project. In retrospect, it's really easy for me to think, to present this and be like, oh yes, we had this wonderful infrastructure that was fault tolerant. But actually it was a journey. And this is a little bit too small to see. But when we first started the project, we knew HPC. So we started to build things that looked like HPC environments within these commercial clouds or academic clouds. And that would include multiple compute nodes, a distributed file system, very much like what you would expect in an HPC environment. And you combine that with the 50% of your nodes disappearing overnight. And it turns out these systems are not robust against half of the nodes dropping out. So we'd be in a situation where we'd send some jobs out to these virtual clusters around the world. And if we had nodes dropping out, which was pretty frequent, we would lose all those workflows. So anything that was running on that cluster would be lost and would have to be resubmitted. So instead, a very simple architecture for PCOG or architecture 2 was, okay, what we're going to do is we're going to send a workflow out to a node in a cloud environment. Not very technically sophisticated, not very scalable, but it does insulate us from one of those nodes going offline. It only takes down one of the workflows. And so architecture 3 was an evolution of that. And what we ended up doing is we ended up building a queuing system using RabbitMQ. And so we'd have these queues spread around the world. And not all of our PCOG infrastructure used this. There were some HPC. There were some interfaces that were essentially Christina giving a collection of samples to Barcelona, for example. And they would process using traditional HPC. But in cloud environments that my team and I were working in, we deployed this software. It was called Consonance. And I actually still use it at Santa Cruz. It's a queue-based system, so I can in queue a work into a RabbitMQ. And it sits there until a worker node is available. The worker node pulls it. But the important secret sauce here is that when it pulls that job, if that node disappears and I don't hear from it from a certain amount of time, the system will know, oh, okay, I'm going to re-enqueue that work. I'm going to spin up another worker. And I'm going to pan that job off to that worker. And that turned out to be a huge impactful change in our project in terms of just not requiring human intervention when things go wrong. Wasn't as efficient as having jobs do scatter gather, which is individual steps of a workflow being broken across environments. That's something that's brought in with systems like Toil, which my group at Santa Cruz is currently working on, or Cromwell provide that level of functionality. But for two years ago in the PCOG project, this was a huge improvement. And this is something that is a great sort of model that you guys can use if you're kind of like at that intermediate scale of, let me do a hundred genomes. So let me do a thousand RNA-seq runs where you can get away with putting it on a single node, but still want that resilience. Now, I say things like the private cloud or the academic cloud. Some of them had issues. They were new environments. Some nodes would disappear. But that actually can happen in commercial environments. And one of the things that we learned in the commercial environments, and you guys around the tables here will want to know this as well, commercial environments offer something called spot marketplace. Or preemptible instances. And what that means is you can actually get, you can use those instances at a tenth the cost, right? So this hugely impacts your bottom line. The caveat here is that your instances may disappear with anywhere from 10 seconds notice to 30 seconds notice. So you have to use infrastructure like this that's resilient to things going away. But the benefit is you actually get to, you know, you get to essentially stretch your money by a factor of typically about tenfold. So this for us was the mechanism. Again, PCOG wasn't funded. So this was a mechanism that we could very cheaply do work on Amazon, for example, while, you know, essentially not having funding for it, but kind of stretching our dollar as far as we could make it go. I wish I could say the former in PCOG, but PCOG was kind of a learning exercise. I mean, I'm looking at George and Christina here. We learned a lot in the process, and we had to essentially kind of start simple and go more complicated over time. Newer pieces of software like I reference Toil and Cromwell and those workflow engines that know how to shard up your workflow and send parallel running jobs to various nodes. Those tend to be much better about having a much smaller footprint if one of the worker nodes disappears. And something like Toil, for example, actually caches output to S3. So if something is interrupted and you've reached a checkpoint, it doesn't necessarily have to go back to the beginning. So that's an important feature. For us, we were just happy that we had an automated mechanism to restart the workflow. So our cloud shepherds didn't have to get up at three in the morning and hit a button to make it work. So essentially, yeah, so Google has preemptible instances where there's a fixed price and they can disappear. On Amazon, it's actually a marketplace. So you actually see spikes. And I saved a screenshot once of one unlucky soul had bid $10,000 per hour on his VM type and actually was charged that because you could see the spot market price going up to $10,000. So we actually would monitor the marketplace. And I do remember there was one week where something was happening on Amazon. The marketplace was incredibly, and it might have been several weeks actually, was incredibly busy and things were extremely expensive. So at that point, we switched to a smaller, non-preemptible fleet and continued doing work. But then once we saw the overall prices dip down, we then shifted back to the spot marketplace. So it's interesting, but having software to help with that process is really key as well. So we put in things in constants like, oh, here's my max bid price, or try and look at these various regions that have different pricing and go for the one that has the cheapest average price of the last several days. Okay, so hopefully that gives you guys a taste of some of the lessons learned from Peacock. It was a long project. It was very complicated. There were many people involved. I think the total number of researchers, Christina may be correct if I'm wrong, but I think it was in the neighborhood of 700 researchers who were involved in this. And it was all a collaboration which actually fills my heart with joy that we as scientists can get together and just be like, let's do this. Let's make this work. But we definitely learned a lot here. And one of the major lessons I learned and really impacted the work my team did and continues to do to this day, both at OICR and Santa Cruz is really thinking about how we bring the algorithms to the data. And so Docker is a wonderful technology. I think it'll be around for a while. I think something like it will be around for a long, long time. But what it lets you do is it lets you package your tools, your configuration, maybe some of your reference files into a lightweight, highly mobile image. So think of a virtual machine if anyone's used virtual ware on their virtual box on their local machine. It's essentially this image that can be moved around. And it has a lot of really nice tooling around it as well. So it has fantastic tools like quay.io or key.io. I don't know the right way to pronounce it, actually. I say quay, so forgive me if that's wrong. Docker Hub, and there's a really nice set of tools around it which really make this very powerful. Typically, these mobile images, these are Linux-based. They're supported on all major operating systems. So what's nice is you can develop locally on your laptop, push it up to Docker Hub, go ahead and pull it back on your VMs running on the cloud. Like that level of portability is something that's pretty new in our community and it's super helpful, super awesome. It's also, what's neat is it's not just our community using this. This is actually very popular in the generics of IT space, which means that there's a lot of great tooling out there and this is not something that's going to be kind of in our insular sort of community, but it's widely used, widely adopted. So we're going to go into, oh, you can hardly see that. All right, well, thankfully I have screenshots in the interactive tutorial today that we'll go through, but this will give you kind of a glimpse into what the Docker file looks like. The Docker file is what's used to construct one of these mobile images. You can see here on the first line, I'm saying from Ubuntu, and this is basically saying, I'm using this existing base image, right? So you always start with a base image. Someone has done a lot more work for you and prepared in Ubuntu 16, you don't have to worry about it. It's a basic image, just like the ones that you've used on Collaboratory or AWS already. You can then include things like, oh, the maintainer is me. I'm going to switch to user root. I'm going to do some tasks like copying files over, changing their permissions, and then I'm going to specify a command to be executed by default. Now, if you're looking at this and thinking, it kind of looks like a bash shell. So who here uses bash? Okay, so about a third or so. Yeah. So if you notice this, these just look like normal commands, normal commands in a Linux environment, prefixed with these copy or run or other things. And it really is intended to be exactly that. It's intended to be something very simple to learn, but powerful because you essentially expose all the commands that you know and love from a Linux environment. Okay, so that's Docker, and that allows you to create these highly mobile images, which is awesome. The common workflow language and workflow definition language or Whittle, these are, I already gave you a little background on these, but these were created to essentially provide a workflow language that was agnostic. You don't want to have to rewrite everything for every different environment. So this is our gateway to that. The downside for these are they're essentially documents. So if you're looking for some workflow language that allows you to dynamically add or shrink nodes in your workflow, or do sort of much more sophisticated work, these are not them. These are not the languages that will help you. But if you're looking for something to string together a relatively complicated workflow that has branching and coming back, and is extremely portable, these are really what you want to go for nowadays. This is the state of the art in terms of how people write workflows that are going to be portable. So just to take from their website here, the goal for CWL is to describe analysis, workflow, and tools in a way that makes them portable and scalable across a variety of software and hardware environments. Towards the end of the tutorial today, I'll show you some of those environments that you can run these in. So it's not only like on your command line, on your laptop or on a VM, but there's also commercial environments and academic environments. Christina touched on some of them with FireCloud and the Cloud Pilots and Seven Bridges. So I have a list of those at the end, and what's nice about those is those give you a way to take what you've done at small scale and then start scaling it out on the cloud. Many of those include really nice GUIs, so you don't even have to really write things up in a script. You can actually import your workflow in CWL, tweak it, run it, and then actually run it at scale in something like FireCloud. So Whittle is similar, but really it is focused and it's known for a more simplified syntax. I've been told that more traditional biologists are less geeky like Whittle. They like the simplicity and the straightforwardness of Whittle, but it's essentially trying to do the exact same thing and both of them have multiple places that you can run the multiple workflow engines that support their execution. So I'm actually not going to spend a huge amount of time talking about Whittle because I'm personally more familiar with CWL, but this gives you an idea of what a Whittle looks like. You can see it's fairly straightforward, it's fairly simple. Here you have a task defined. This is just MD5 sum, so it's a checksum. Effectively, like a hello world, it's not a complicated thing. Here's the command, here's notes about output, here's notes about runtime environment that the command is coming from this docker-based tool or a container, I should say, that it needs one CPU and at least 512 megs of RAM. That at its core is a task, and then this workflow, oh, yeah. Question? You'll see it in a few minutes. Yeah, during the interactive tutorial, I'll show you a command line tool that knows how to look at this and execute it. But also, one of the cool things here is that we have these commercial and active platforms where you can literally take this bit of code, put it in there, and say, run this on 10,000 samples, right? So it's that bridge between what we're going to do in the interactive tutorial today, which is small scale on a single VM, to getting you to the point where you can take the same bit of code and run that on many, many different samples. Okay, so that was the task. The task is then actually called in a workflow here. So you can see there's just one step and it's calling the MD5 sum, and it's binding inputs to the task, and the task is then printing out the output. So that gives you an idea of what a Whittle descriptor looks like. The final thing that I want to talk about related to Whittle and CWL is Dockstore. So I'd start off by saying, so Dockstore was created at the end of the PCOG project, continues to be a collaboration with Santa Cruz and OICR and other folks in the community. It's actually gaining traction. I was applying for a grant the other day and it said, oh, something like Dockstore.org. And I'm like, oh, I should totally apply to this grant then. They're referencing me. So it's actually starting to get... Yeah, I got the grant by the way. I don't like, how about is Dockstore? You don't have to have a like, you can be... So yeah, so it's actually starting to get traction in the community, which is great because it means that we hit the nail on the head in terms of wanting to package up our scientific tools and workflows and have a mechanism to share them. So that's really what this was all about. So Dockstore provides the packaging, the secret sauce that Dockstore provides that's beyond the basic services like Quay or Docker Hub that are already out there is it's putting together the Dockstore-based tool with CWL and Whittle. So it's meshing the two together and making it easy to run at a small scale and also providing links out to platforms that let you run that at a large scale. It's focused on the scientific community but you could use it for really anything that you can put in Whittle or CWL. It's not actually specific to us. I know that there's some physicists, the physics community is starting to use that, this technology stack. And so essentially, this is a very generic thing but we market it towards the scientific community. We market it towards the bioinformatics community. So how does it actually work? And you'll get to actually do this process during my tutorial. What's happening here is we're standing on the shoulders of giants here. We're wanting to build on top of services that people like, that services that work well. So Dockstore is a registry. What you do is you put your workflow, you put your Docker file in GitHub or Bitbucket, one of these services that already exists. And you can link that to Quay to have Quay build the images. And what Dockstore does is it basically references Quay to get the Docker image and it references GitHub for your CWL or WIDL descriptor that describes that image and presents it to the community with an easy to use command line tool so you can kick the tires and use this as well as linking out to places that allow you to execute a scale. Likewise, in the center of this diagram, you can also use it to register workflows which is actually a simpler process. You basically put your workflow, CWL or WIDL, in GitHub or Bitbucket or another place and you tell Dockstore about it and you point Dockstore to it and Dockstore again will show versions of the workflow and help people understand how to call the workflow, how to run it with a simplified command line. So it's actually a fairly straightforward process and I hope we get through doing both of those together throughout the afternoon. So yeah, let me go back to this one. So this Docker file is just describing how I've installed a piece of software, MD5Sum, in a particular Docker image. Right, so now I've packaged up my tool, MD5Sum. You can then refer to that tool, this is WIDL for example, you can refer to that tool and call it and what WIDL is doing here is it's providing you, as well as CWL, it's providing you a way to say this is the input and this is how you call that tool and that's something that's actually missing. If you go to Docker Hub and you try and find a tool, you're likely to find something but then you actually have to open up that Docker image and try and figure out where they installed it and try and figure out its parameters, what are its inputs, what are its outputs, that is what WIDL is doing here or CWL, it's saying this Docker based tool, it takes as an input file, it produces an output in MD5Sum.txt file and it makes it crystal clear how that tool should be executed. Yeah, so that's essentially what Dock Store is doing is it's bringing those two things together and saying, right, okay, this Docker image that has MD5Sum in it, this goes with this CWL or WIDL that describes that tool or that workflow and gives you very clear instructions on how to run it and Dock Store is bringing those two things together. Oh, why not in Legends together? Oh, I mean that's a really, why not put it in the Docker file? Yeah, great point. It's because Docker files were created by Docker, Inc. as a general sort of IT solution and what we're doing is kind of building on top of what's out there already. So I've actually visited Docker, Inc. and said, oh, can we put this together? And they said, oh, very interesting, have a t-shirt, thank you for stopping by. So I mean, I still have a t-shirt too. So they're interested, they're interested in kind of moving in that direction of like, oh, yeah, we could actually use Docker and string it together and actually use Docker for workflows. But their primary mission, according to the conversation is they're all about microservices and serving up websites and that sort of thing. So it's a slightly different mission and that's why the concept of workflow language is kind of on top of Docker rather than embedded into it. But I think you'll see in another year or so, I think you'll actually start to see something official from Docker about how you string dockers together for data processing needs. I would not be surprised at all if that comes out in the near future and that might actually, maybe that eclipses CWL and Whittle, but I suspect that these will be around for a while. Okay, so we're going to dive into this together and you all are going to sign up for accounts and be able to put things in Dock Store today. But this gives you a little idea of what it looks like from a developer's perspective. I can add and remove and publish my tools in Dock Store. And the other thing that you'll get to play with today is running tools and workflows from Dock Store. We provide a command line. We're going to use the command line today, but the command line under the hood is actually calling out CWL tool to run CWL things and Cromwell to run Whittle things. So it's really just a convenience wrapper so that regardless of what tool format you're using, you can just use the Dock Store command line and it kind of hides the details of the differences between the two. Oh, the other thing I should mention too is one of the nice features of Dock Store command line is we've actually added file provisioning into it so it can provision files to and from places on the cloud environment. And I actually think some of the like the ICGC client, for example, has been integrated as a plug-in. Okay, and then I've already mentioned this, but we are partnering with companies out there to provide a convenient way to find something in Dock Store and take it to run on a platform that can allow you to scale out to thousands of jobs. And this is our first sort of commercial integration with a company called DNAstack, which is actually next door to OICR in Toronto. They allow you to run Whittle-based workflows on the cloud through a completely managed and very nice convenient functional GUI. So we basically take you directly to their platform to run a particular tool.