 I'll make this very brief so it's great to bring a good friend and colleague of the institutes, Jim Astell here, now in his new role as director of NCBI. Many of us at NHGRI have interacted with Jim for many years in many ways and on many committees and consortium and so forth. But we're delighted. He's now at the helm of NCBI. I went to try to get bio information because I couldn't remember it but then I realized he's been at NCBI forever seemingly since it became NCBI like 1988 or something crazy like that and then I said well he certainly did something before that and there's just not much out there because he's been most of his professional career at NCBI. I guess you were in the private sector a little bit doing software development after getting your PhD at Harvard but fundamentally most of your life has been at NCBI from what I understand. So I think it's cool that you're now the director and we brought him here. There was interest in counsel to hear from NCBI now that it had new leadership, lots of areas of interaction. I'm sure Jim's going to present a number of them and I think we'll have some healthy discussion after that. Jim. Thank you. Yeah, in fact I've been at this so long that my graduate career spanned eight years because my committee was divided on whether I was doing biology or computers and there was only a degree for one or the other. They couldn't really give me a degree for doing both and so that was a struggle which is so I went into business in the meantime until they made up their minds. So 30 years ago when I started actually the notion of data handling and IT was that it was this isolated thing. There was an IT group. You would take your data to a statistician. You would collaborate with them to analyze it and in a sense there was this notion that computers were in some hidden room and your data went into that room and the wizards would do things, answers would come back and you'd add that to your results section. That's really changed. At this point both in academia and in most companies there's not an IT department. There's not a data analysis group per se. Instead in the corporate world data is basically the lifeblood of the whole company. Every part of the organization works with data. Every part supplies data and it's a much more fluid and open situation. NCBI was created like I said 30 years ago and it was a division of the library of medicine. It has a computer room. There's servers in it. Data goes there. You submit your data to the archive. There's some review and then if you want to analyze it you either download the data to your computer room or you use our tools and you look at it. So about two years ago I started a strategic shift at NCBI on the technical level and since I've become the director it's now on a policy level too. To try to sort of move out of the back room and out onto the ocean of data I can think of it as rather as collecting everything inside of us. We're trying to sort of navigate in the outside world and the obvious way to do this technically is you move on to commercial cloud platforms. Not private cloud platforms but the big commercial ones which is really where everything is exploding and where things are changing very, very fast. I would just comment that the Department of Defense for example worked with Amazon to build a private internal copy of the Amazon cloud because of security reasons. By the time they got it built it was three years out of date and they're already having trouble getting sort of modern software developers and analysts to work there because they don't have the tools they expect to find on the big commercial sites. So I think we have to go out there. We have to be on top of those platforms. We're addressing really two use cases. The first use case is to deliver NCBI services to the public and in this sense it's things like PubMed and GenBank and Blast and stuff and in this case we're much more like a commercial company like Netflix or Amazon Shopping or something like that because we're getting three and a half million unique users a day. We run at about 7,000 web hits a second. So this is a serious big engineering data delivery system and so we've adopted for example the Twitter stack. We're using very high performance service meshes. There's constant monitoring and I'll talk about a little bit more. The other use is to provide access to data for others and that's actually quite different architecturally. One use is to access to NCBI data and of course the advantage of having it on the cloud is you don't have to copy it. It's already there. You don't have to update it because the most recent version is the version that's there and it's convenient to use. You don't have to go inside our servers to use it with our software. You can use it basically on your own dime. You don't have to negotiate with us about it. The other use which is actually pretty exciting and interesting I think is access to non-NCBI data that is in the cloud environment. We can provide indexing and access permissions without actually owning the data or owning the space where it's stored. It becomes much more of a negotiated relationship with other institutes. If they're choosing to invest in some very large project we don't have to be separate from that. We can be collaborators because we're occupying the same space and we're trying to develop a technology stack which makes that more natural and more usable. Prong of that I'm going to briefly talk about which is delivering our services to the public. We decided when we were going to do this that we're not going to lift what we're doing now and just put it on the cloud. The decision was that we're going to do this cloud native from the bottom up. This is a huge sea change for us and for the people who work for us. We've had to reeducate our workforce. We've had to move into different tools and different ways of doing work. But it also means we're not carrying history with us when we go. So all new applications use CICD which is something called continuous integration, continuous deployment. That's a process where coders can work together, flow their source code into a common repository, release of automated tests run to make sure you didn't break anything in that process. When things come to a level where they're working again, it's automatically deployed out onto the cloud. Human beings don't do that. And so it's a much more fluid process. It's a much more fast process. It's much more controlled. But it's serious organization. A lot of our old dogs are having trouble learning these new tricks. But it's their only choice actually at this point. Plus, everything is web-instrumented to begin with. So it doesn't rule out without a full set of monitoring tools on it that allow us to track performance right away, even in the first prototype. And then from all the way forward because the other thing that modern software technology does is you don't guess. You don't build features because you think they're great. You test, you see, look at user feedback. You're constantly experimenting and changing things. An example of that is agile development. So this is just a case where we're trying out four different kinds of buttons for the same thing on different segments of the population while they're actually using it. And then we tabulate which ones work better. And then we put the new one in. And in fact, every major web-enabled service that you use is doing this to you all the time. You may not realize it. Occasionally, you'll notice. But essentially, you are the guinea pigs. I mean, it's the ultimate sort of crowd sourcing. And when you're working at the rates that we have for things like PubMed, you know, you really need that sort of usage to be able to test a small fraction. It's not something it's harder to do in sort of an academic setting. The other thing is every time we put in a feature, we automatically monitor its usage. So when we introduce a new search, we're tracking when the different pieces went in and then we're looking at the usage and we're determining, in fact, people do seem to be using this feature more. So it's better in the sense that the customers are accepting it. And these can be fairly small amounts. If you improve, I mean, at four million people a day, improving something 2% is a lot of people that had something work better for them. We've actually launched the first version of this. It's called PubMed Labs. It's our experimental site where the web-enabled, or the cloud-enabled version of PubMed is running. This is just an example of the way the web pages are designed. They're designed so that they automatically reconfigure to your device. So this is the same page on a computer, on a tablet, and on a phone. It's not just a technology implementation. It also has to do with the way you design the page. So when you're working in a small area, it's still functional and convenient, but you get more when you're running in a bigger. So there's a lot more attention to design and user interface and that sort of thing. The lab site is purposely where we will be showing very early versions of things. Even as we begin to replace some of the main resources, there will be labs versions of other things that are fairly new out the box. This is the current architecture. If you run at the lab site, you actually go to the NCBI website. Then we go out to Amazon out the back door and run the new system. And then we actually come back to NCBI again to run the pieces we haven't ported yet. So it's sort of a hybrid system. We're using standard technologies back here, like I said, the Twitter stack, but also things like MongoDB, solar for indexing. There was also a lot of learning how to use those. Using those more modern tools, you don't have a service right out of the box anymore than if you decide you're going to use a relational database. It means you have a functioning resource. It's just the beginning. We are in the process now of switching to something where the front door will actually be on the commercial cloud and this version is running on Google. Which is another decision we made. We wanted to be cloud at vendor agnostic as best we could without sort of cutting off our nose despite our face in terms of features. So the main system and the place you come in will be on Google and we'll still be calling NCBI for some of the services that back it up that we haven't moved yet. And then the final configuration is everything is there and capable of running on either Google or Amazon or both so that we have some protections. So PubMed's the first one out the door and as I'll show you, we're now working on Blast. The part that's maybe more directly relevant to this audience is providing access to data for others because this is a group that's are heavy data users and heavy data generators. We actually started out, like I say, two years ago where we put Blast into the Amazon marketplace. So the idea is in the marketplace you don't have to install anything. You can just go to the app store and say I want that. In this case you don't pay anything for it but it's all canned and ready to go and you just say it's all pre-configured. You just say run it and it starts up and that saves people downloading the binary and keeping it up to date on their machine. You always get the latest version every time you run it. The limitation of this is that you have to pull the database from NCBI. And so that can take an hour even over a fast internet connection. And we were really put this up originally for people that wanted to do 10,000 blast searches or something like that. So we thought it's not that big a deal. They have to wait for the database. Turns out it's a pain. They don't like it as you might expect. And really it's not the way you would really want this to work. Instead we want a more integrated approach. And in this case right now we're working on moving Blast on to Google. There's a whole bunch of re-architecting involved in this which I won't tell you about. But essentially you can think of in the NCBI cloud space, we would have the Blast service, the latest version of the program. And web users would come in through a web page like you would do now, except it's on the cloud, run the Blast service. And it would read the Blast database that we had sitting there to use with the services we offered on the web. Having architected it this way, it also means that if you want to download the Blast database, you just take that copy. You don't have to go somewhere else, you don't have to have a different version. In addition, on the cloud here, whenever we update the Blast service, we would also update the version that's in the cloud marketplace in the App Store. If you wanted to use it, say you wanted to use it to do heavier work than we would allow you to do free on our dime. You can just now go to the cloud marketplace. You can get your latest version of Blast just the way you could do before. But when you now set up your job to run it, it's gonna take the Blast database right from the cloud. So there's no longer this hour wait while it transfers, it's sitting right there. And so in a sense, what you get now is you no longer have to download the executable, you no longer have to keep the database up to date. You just run it if you have an account on the cloud and you don't need to put things on your local servers. And what we're imagining here is that the inevitable direction for scientific computing is the next generation is gonna be on the commercial clouds. They're not gonna be on their local computers. And so this is the world that they will be coming into and this is the way they'll expect to get things. That they go to the App Store, they get it, they run it. They're done. You're only paying for your computer for the time you're actually running the search and the rest of the time you're not paying. And that makes a huge difference to scientific users including us. So for example at NCBI right now with the compute center, we have to own enough physical computers to handle the highest load that we ever expect to see even if that only happens one hour a day. So we pay the highest price for sort of the maximum peak and then those machines sit semi idle for part of the day. In a cloud environment, what happens is you buy the machine when you need it and you stop paying and it goes away. So for a scientific use, it means you're not paying for your big university compute center all the time. If you want to do a big compute say you want to go back to top med and recompute all the variants in it. You don't have to own enough computers to do that. You can go to the cloud once and you can fire up 10,000 computers, get it all done in one day. It's the same price as if you took 10 days with 1,000 computers or 1,000 days with 10 computers. Might as well get it all done in one day. You put your money down and you're done. And you're not maintaining something in the interim. And for NCBI, we have the same advantage if we need to go back and do some large recompute or re-annotate all the bacterial genomes or something like that. Okay, there's also data that isn't NCBI data that we could also provide some assistance on. One example is the Reads in the Cloud project that's going on at the NIH Commons Cloud Pilots. This is a situation where mega sequencing projects, particularly big human genome projects, when the process was, well, we give all the data to NCBI and they store it for us. That was breaking the bank. We're not that big. One of these studies can consume $2 million worth of disk space. And we really just couldn't do it. So there's another model which is the group that's funding the project buys the space themselves. We don't buy it, but we assist them in setting it up and making it accessible and using it. So what we proposed as a model right now in DbGaP, what happens is a submitter comes in, they register their study. They list all the individuals in their study. We do some quality control in those individuals. We make sure they all have consents associated with them. And then when the study is ready, it's published in DbGaP and people can download it. They request the data, you submit a request. It goes through an approval process, which is actually tied to your identity as an NIH researcher. And it's linked to your institution, which is guaranteeing your behavior. So there's a whole bunch of identity management linkage things going on in the background here to ensure that you're going to be a good steward of the data. Once you're accepted to have access to the data, then that identity is what allows you to select the studies that you've been approved for. Then you can access them, and then you download them. So it's fairly straightforward once you get through all the identity management approval process. What happens with human sequence reads in the old world where it was all deposited NCBI, it would come in an original format, would go into the short read archive. We'd convert it, ETLing it's called, into a common archival format. And then we could regenerate BAMs and other formats that people liked. But there's a very compact, very dense way that this is held. If you want access, you come in through the DB GAP authorized access, and you get this version. The burdens of these very large projects started something new called Trusted Partners, which NCI did. There's an Alzheimer's version of this, where essentially they came through the DB GAP authorized access, but then we sort of handed them off to that group. But this is also kind of complicated, because to be a trusted partner, you need a contractual relationship with NIH, because in a sense you are taking on responsibility for stewardship and managing the data and lots of legal requirements. So this is a fairly heavyweight solution, but it did get some of these other sites up. What we proposed with the cloud pilots is something a little different, where you could still go through the DB GAP process to register the studies, collect the consents, link them to individual patient samples. But when it came time for the big data chump, the reads, instead, that could be acquired a number of different ways on commercial clouds. And in fact, there's direct purchases by InstaIC's, indirect purchases through funding a grantee. And there's also a process of CIT, the Central Computing Service at NIH is starting, which is a cloud marketplace, where essentially they allow institutes to get pre-bid prices on cloud storage. And there's a company called MITRE, which is helping the institute set up that space once they get it. But essentially, some other process allocates a set of space. The PI puts their reads there. And then there's a transaction between the PI and DB GAP, where they essentially provide a manifest, where they just say, here it is. Here's the read files. It goes to these patient records, or these patients, individuals. And then there's a little bit of QA here, where we make sure that actually every read is connected to an individual who has consent language and is actually part of the study. But we don't actually transfer the data. We don't look into the data. It's in the original form that came in in the study. In this version, when our sequester comes in, they do the same process. If all they want is the small stuff, like the VCF files or the phenotype files, they can just download it from DB GAP the usual way. It's actually not a big data problem. But for the small number of people that actually want the reads, they actually get a reference that takes them to here and allows them to access it on the cloud without transferring it back down and to compute in place. So in this version, what happens now is rather than transfer the original BAM that's in the cloud, there's an XML description of what it is that goes to the short read archive. So it's sort of a placeholder that says it's over there, but this is what it is. There's a new identity management system. This is something we're collaborating with CIT to do. It's a next generation. So we had to cobble together identity management to run DB GAP by tying our own identifications, the ERA identifications, the internal NIH identifications. And we've also, for public access, which is a whole another set of identities, what's happened is CIT has agreed to take that on. We're providing the information to it that this thing needs. And there's a new federated identity system here which can link to your account on the cloud as well. So it can say Amazon user number 10 is the same person as this ERA PI who has been given permission to use this data. And so you don't need another identity. And we're actually tying you to the data through the identities that are on Amazon or Google or wherever they are. That identity, you can still get the stuff that isn't on the cloud the same way that you would have gotten it before. But now you also have access directly to this. You can compute directly on it. And so we're prototyping this with the NIH Clouds Commons groups. This system is expected to be running in its first bare bones version by the end of March. Data is already being placed in the cloud for those pilots. And they'll be the first users of the access to this. We'll be the people working on the Clouds Commons. And we'll be working closely with them as we'll see IT to make sure that actually this works and they can get at their data and do the computations they need. There is one other path, which is there is now sort of a data lifecycle question here for the institutes that if you're paying millions of dollars for storage, you really want to pay that forever. And that's really sort of a scientific cost-benefit question. And it's not really a question that NCBI can answer. It's a question for the priorities of the institutes that are doing that science. And there can be staging here. This can go from faster storage to slower storage to tape paying less and less over time. And at some point, you may even decide to stop paying for it. And maybe the last VCF file that was called on this is good enough for the quality of sequencing that was done. And you decide, OK, forget it. We're not going to pay for this at all anymore. Some subset of those that you may want to keep forever could still be transferred back to the long-term archives. But it's probably not going to be everything anymore. And so that's kind of a new era for people to think about in a data-rich world if you don't keep everything forever. But what this leads us to, I think, is something that directly supports this federated data commons model. We've got a common, and this slide came from Nick Webber, who's working on the Cloud Pilots. But what I just told you does produce a common user authentication scheme. It does create shared APIs that really do access data and allow you to compute on it. It does support the fair principles that you can actually find this stuff. You can use it. You can reuse it. These are some technical details. The digital objects will have catalogs, whether it's through DBGaP or some other source. You can go look up what you want. They've got IDs. You can talk about them. And it will be critical that we count on data standards here, because since we will only have the metadata for the data that's on the Cloud, not checking the data itself, people really do have to adhere to the standards if they say this is a file of a certain type. You can't pretend and fool around anymore, because things won't work if you don't do it. So I think endpoint is we started heading toward tomorrow several years ago. And I think the first steps are starting to crest the hill. And you're going to see both things emerging in 2018 as real functioning resources in their first format. Thank you. Thank you, Jim. Very informative. So the world is changing. That seems to be the theme of the day, isn't it? So many ways. Questions for Jim? On your last slide, you had all of us. And my understanding is that all of us is sort of developing their own thing. Yes, they are. And how do you envision interfacing with that? Well, I don't know. Like I say, I took a slide from somebody else who's putting together this to try to show that it actually there is progress here. I think the first version of the authentication system that we're doing is not even as ambitious as what the NIH cloud pilots would like. We're making the argument that it's better to walk before you run. And if you can reimplement the existing NIH policies for the people that are in the ERA system, which is actually a huge swath of biomedical researchers in the world, including Europeans and Asians and other places, that better to take that step and have it work and then build up from there. I think we'll be talking more to the all of us folks once more exists in reality here. But yeah, exactly. Carol? Hi, Jim. Thanks for the update. There's a lot of really exciting things going on. One of the things that's happened, of course, at NCBI recently is the stopping of support for representing variants for model organisms and focusing purely on human. And so NCBI has always been a really great resource for comparative genomics and comparative biology. And now that those data live over in the European Variation Archive, is there a way that you think this architecture will enable sort of be able to integrate, reintegrate the model organism data with the human data so that we can continue to support sort of comparative genomics and basic biology? Yeah, that would be the hope. I mean, if you think about the, so that move was basically a budget move. We just couldn't afford it anymore. And EBI had been wanting to do something in the variation field, so we made a deal with them. And we've done similar things with them around the sequence databases. And essentially, around the sequence databases, we agree on standards between our sites. And in the old model, we exchange all the data. So every data has the full set. In a future world, we wouldn't necessarily have to exchange the data as long as we adhere to standards and we knew where each other's data was. There's some extra wrinkles for the Europeans around cloud computing because there's directives from the EU to not use American cloud computing companies. Because they're trying to, right now, the American companies are dominant and Chinese. And Europe is concerned about developing their own cloud computing business, which I understand. So the Europeans have some other limits. They've got Elixir, which funded a big internal cloud computing center. But I think, as I was mentioning, I think that's really sort of not the future. So they're in a bit of a catch-22. So theoretically, yes, we should be able to link the pieces back together. Right now, there's still some stumbling blocks. First, thank you. It was informative. So implicit in this model is individual ICs or individual grantees even are going to buy into a chunk of storage and then later even compute. In two sentences, what is the advantage? Is there an arbitrage? So it's much cheaper than I can go out and buy it in the spot market. So what is the advantage to the individual user or individual IC to joining this, I'll call it, model? Well, it's a strategic question. No, I'm saying that NCBI made a strategic decision. We rubbed our crystal ball and looked at market trends for commercial companies and other big enterprises. And everybody is moving on to commercial clouds. And the reason for just generically why you would do that is because you're not owning everything anymore. You're renting it, basically. And the advantage is that for these big, very powerful computer systems, they're expensive. They change fast. And it's for us to buy, say, some new piece of hardware or a university computing center or some other internal, you pay a lot of money. You might not have picked the right thing. But you own it for three years now. So let me just finish why there's a thrust here to do this. But if you go to a big commercial cloud provider, they have the best engineers in the world picking these things. They're trying them all out. And so you get the advantage of their flow of technology that you can't afford to pay for yourself in your own site. So right now what happens is there's overhead that gets paid from your grant to your institution. Let's say that overhead goes to your university computing center. That becomes a resource you can use to do your computation. And so the economics of this is not as straightforward as if you were a company and you're just paying for the compute. In terms of money, we did a back of the envelope calculation. And right now, for us to have the same capacity that we need to do PubMed on the cloud is roughly break even with paying to have it in-house. And their costs are going down. Our costs are going up because we have to pay people salaries. We have to pay for space. We have to pay for cooling. We have to pay for electricity. And we have to pay for hardware. Finally, the last point is this burst capacity that I mentioned that if, say, in your research enterprise one time in the year, you want to do 1,000 computer calculation, get the answer. And that's the big compute you're going to do this year. You don't want to buy 1,000 computers. You want to rent them for a day. It's much more cost effective that way. And you get hugely more capacity than you could possibly own yourself. So I think those are sort of the forces that are pushing things this way. I understand. That wasn't my question. OK. Try again. So I drank the Kool-Aid of cloud computing before the NIH did. My question is, why should I join your cloud and not go to AWS or Google myself and build my own program for my own problem and my own data? You could. But what's the advantage of joining your model? I liked your model because I think it's actually silly for us all to do this. So I like the model. But what's the incentive? How do we incentivize the individual grantee or the IC or another private program to join in this club? I think it has to get going a little further before there's going to be an obvious advantage to it. For example, what do you get by doing this? One thing you get for the human data is you get identity management. And that's actually a big expensive enterprise. And if you are, say, managing human data on a large grant from NHGRI and some other researcher would like to use it, you can collaborate with them and they can go through a pre-existing identity management and validation system. You don't have to create accounts for them. The other thing which I think we're setting up a little bit of a false dichotomy here when we say it's my system and your system. There's a real blurring of the boundaries when we start to do this. So the example I gave of somebody running BLAST, they're running in that case, they're running in their own space. They paid Amazon for that space. It's their space. They happen to have taken our tool, BLAST, out of the app store directly from us. And they happen to be taking the BLAST database directly from us. It's convenient. But they may be running a pipeline of which BLAST is just one piece. And they got another piece from somebody else's app store. And there's another piece of data they got from a third party. And then there's some other things that their grad students wrote themselves. And they've stitched it all together. And what's intriguing about this now is you're just using stuff. You're just using the applications. You're just using the data from NCBI. You're not installing it. You're not downloading it. You're not updating it. And similarly, what we're thinking is we can use other people's applications the same way. And we don't have to install it. We don't have to move it over if it's some sort of researchy thing that we wouldn't support as a public utility, but is useful for our internal processing or something. We could certainly do it. And you can also think in this app store, it's a marketplace. So there could be commercial products in there. Also, that, say, some company comes up with a better variant caller. And it's going to cost you, I don't know, 10 bucks a genome to use it. Maybe you would. If it's that much better. And so I think that's what we're looking for doing. This is the first step toward beginning to build an ecosystem that would support public archives like us, individual grantees, and commercial products, all sort of in the same situation. Jim, can I ask you a question? It may partially relate to what Eric was asking. Over the years, or recent years, there's been discussion about sort of credits that might be issued for compute credits. And so one of the things that, and I don't know where any of those discussions stand, but it might be that when you apply for a grant that instead of just getting money, you'll get money to do research, you may also get allocated a certain amount of credits for computing. And if that was the case, we would be, we meaning the NIH, and maybe allocated at the individual institute level, we'd be buying that through some strange currency. That wouldn't involve giving money to the institutions, but rather we would have X number, we would dole them out, maybe partially be a council recommendation of how many credits to dole out for a given grant. And then, so therefore, you wouldn't have a choice because it would come with a grant, a certain number of credits. You actually might be requesting it as part of your application. Where are we heading towards such a model? I know we've implemented it. Well, I mean, there is, right now, CIT is setting up the cloud marketplace, where they're basically looking for bids from various cloud providers for storage, compute, give us your best price. And then, at least the first step will be NIH institutes can then buy capacity at those prices from those vendors. And if they like Amazon better than Google, they can pick Amazon or whatever. But we're trying to get them to sort of compete with each other in a bigger marketplace. And then you can use, and that's partly why we're trying to be flexible by vendor, so that things will change, I'm sure, in these marketplaces. You don't want to get caught out. But the other thing is, like I said, over at NIH is already paying money to support compute at an academic settings. It may be that that could be routed some other way as well. So I think we're kind of feeling our way here. My sense is there's a compelling trend to go this direction for many reasons. Exactly how it will play out for NIH in academia, I'm not sure. But I think what I find reassuring is there's stuff, it is happening. There's real things starting to come out this way. And we're kind of starting to tattle forward. So your last comments actually started to get to my question. But to what extent is this going to be fluid across providers? So I think it would presumably be bad to lock the entire NIH community into using one provider. And is it going to be straightforward for users to access the data from different providers? Well, there's probably six layers to the answer to that question. So first of all, obviously the vendors don't want, they'd rather you stayed with them. And so their pricing scheme is set up to do that. So for example, there's a price for compute. There's a price for storage. There's a price for access to that storage. And then there's a much higher price for moving the data out of their cloud, called an egress charge. So one thing we've been doing is negotiating with the vendors saying, well, actually, we're a public resource. We're not selling stuff. And so the egress charge actually inhibits us from using your site, because we want people to take the data. So and they've been a little responsive to that. Another part of the question is, if you think about the World Wide Web, you actually are going to lots of different sites. When you're looking at your browser and you click on a hotlink, you go here, you go here, you don't necessarily know that you've switched vendors when you do that. And that's because you're really just working through the interface. Moving some very large amount of data, I'm going to run two steps in a pipeline on one vendor and then move petabytes of data to another vendor to run the next three steps. That's probably not practical right now. You're going to want to run your whole pipeline on that vendor. So what we're trying to do is convince the vendors that it's in their interest to have the complete data set on their platform, because that will induce people to come there and do the compute. And so the cheaper they can make it for us to put the petabytes of data on there, then the more likely they are to get. And that argument right now is saying, well, how much compute is it? How much money are we going to make? But the reality is you're using different vendors and you don't want to transfer the data if you can avoid it. But certainly you could do, imagine there's a data set on one vendor and another data set on another vendor. As long as the applications are on both vendors, it's fine, because you're going to do your big computation there and the little thing, which is the answer, that's easy to transfer. So it's not perfect, but it's like anything. If you have Macs, they try to make sure you stay on Macs. If you have PCs, they don't make it easy for you to get on your Mac. It's the same thing. Any other questions? Jim, thanks very much. Thank you.