 From around the globe, it's theCUBE with digital coverage of AWS re-invent 2020. Sponsored by Intel and AWS. Welcome back to theCUBE's coverage of AWS re-invent 2020. I'm Lisa Martin. Joining me next is one of our CUBE alumni, Brett McMillan is back, the director of US Federal for AWS. Brett, it's great to see you, glad that you're safe and well. Great, it's great to be back. I think last year when we did theCUBE, we were on the convention floor. It feels very different this year here at re-invent. It's gone virtual. And yet it's still true to how re-invent's always been. It's a learning conference and we're releasing a lot of new products and services for our customers. Yes, a lot of content as you say. The one thing I think I would say about this re-invent, one of the things that's different, it's so quiet around us. Normally we're talking loudly over tens of thousands of people on the showroom floor, but great that AWS is still able to connect in such an actually an even bigger way with its customers. So during Theresa Carlson's keynote, wanna get your opinion on this or some info, she talked about the AWS open data sponsorship program and that you guys are gonna be hosting the National Institutes of Health NIH, sequence read archive data. The biologist and main former gets really excited about that, talk to us about that. Cause especially during the global health crisis that we're in that sounds really promising. Yeah, it very much is. I am so happy that we're working with NIH on this and multiple other initiatives. So the secret read archive or SRA, essentially what it is, it's a very large data set of sequence genomic data. And it's a wide variety of genomic data and it's got not only human genomic data, but all life forms or all branches of life is in SRA to include viruses. And that's really important here during the pandemic. It's one of the largest and oldest sequence genomic data sets that are out there. And yet it's very modern. It has been designed for next generation sequencing. So it's growing, it's modern and it's well used. It's one of the more important ones that I think it's out there. One of the reasons this is so important is that we know to find cures for what human ailments and disease and death. But by studying the genomic code, we can come up with the answers of these or the scientists can come up with the answers for that. And that's what Amazon is doing is we're putting in the hands of the scientists, the tools so that they can help cure heart disease and diabetes and cancer and depression and yes, even viruses that can cause pandemics. So making this data, sorry, just making this data available to those scientists worldwide is incredibly important. Talk to us about that. Yeah, it is. And so within NIH, we're working with the NCBI. When you're dealing with NIH, there's lots of acronyms. And at NIH, it's the National Center for Biotechnology Information. And so we're working with them to make this available as an open data set. What this is important is it's all about increasing the speed for scientific discovery. I personally think that in the fullness of time, the scientists will come up with cures for just about all of the human ailments that are out there. And it's our job at AWS to put into the hands of the scientists the tools they need to make things happen quickly or in our lifetime. And I'm really excited to be working with NIH on that. When we start talking about it, there's multiple things the scientist needs. One is access to these data sets. And SRA, it's a very large data set. It's 45 petabytes and it's growing. I personally believe that it's gonna double every year, year and a half. So it's a very large data set and it's hard to move that data around. It's so much easier to just go into the cloud, compute against it and do your research there in the cloud. And so it's super important. 45 petabytes give you an idea. If it were all human data, that's equivalent to seven and a half million people or put another way, 90% of everybody living in New York City. So that's how big this is. But then also what AWS is doing is we're bringing compute. So in the cloud, you can scale up your compute, scale it down. And then kind of the third leg of the store is giving the scientists easy access to the specialized tool sets they need. And we're doing that in a few different ways. One, the people who design these tool sets design a lot of them on AWS, but then we also make them available through something called AWS Marketplace. So they can just go into Marketplace, get a catalog, go in there and say, I wanna launch this. It launches the software, it launches the infrastructure underneath and it speeds the ability for those scientists to come up with the cures that they need. So SRA is stored in Amazon S3, which is a very popular object store, not just in the scientific community, but virtually every industry uses S3. And by making this available on these public data sets, we're giving the scientists the ability to speed up their research. One of the things that jumps out to me too is it's in addition to enabling them to speed up research, it's also facilitating collaboration globally because now you've got the cloud to drive all of this, which allows researchers in completely different parts of the world to be working together almost in real time. So I can imagine incredible power that this is gonna to provide to that community. So I have to ask you that, you talked about this being all life forms including viruses, COVID-19, what are some of the things that you think we can see expect this to facilitate? Yeah, so earlier in the year, we took the genetic code or NIH took the genetic code and they put it in an SRA-like format and that's now available on AWS. And here's what's great about it is that you can now make it so anybody in the world can go to this open data set and start doing their research. One of our goals here is go back to a democratization of research. So it used to be that, for example, the very first vaccine that came out was a small part of the vaccine that was done by a rural country doctor using essentially test tubes in a microscope. It's gotten hard to do that because data sets are so large you need so much compute by using the power of the cloud we've redemocratized it and now anybody can do it. So for example, with the SRA data set that was done by NIH organizations like the University of British Columbia, Cloud Innovation Center is doing research and what they've done is they've scanned the SRA database. Think about it, they scanned 11 million entries for coronavirus sequencing. That's really hard to do in a typical on-premise data center who's relatively easy to do on AWS. So by making this available we can have a larger number of scientists working on the problems that we need to have solved. Well, and as we all know in the US operation warp speed that warp speed alone term really signifies how quickly we all need this to be progressing forward. But this is not the first partnership that AWS has had with the NIH. Talk to me about what you guys, what some of the other things are that you're doing together. Yeah, we've been working with NIH for a very long time. Back in 2012, we worked with NIH on what was called the thousand genome data set. This is another really important data set and it's a large number of, again, sequence human genomes. And we moved that into again, an open data set on AWS. And what's happened in the last eight years is many scientists have been able to compute about on it. And the other wonderful power of the cloud is over time we continue to bring out tools to make it easier for people to work. So whether or not they're computing using our instance types, we call it Elastic Cloud Compute for EC2, whether they're doing that or they were doing some high performance computing using EMR, Elastic MapReduce, they can do that. And then we brought out new things that really take it to the next layer level like Amazon SageMaker. And this makes it really easy for the scientists to launch machine learning algorithms on AWS. So we've done the thousand genome data set. There's a number of other areas within NIH that we've been working on. So for example, over at National Cancer Institute, we've been providing some expert guidance on best practices to how you can architect and work on these COVID related workloads. NIH does things with collaboration with many different universities, over 2,500 academic institutions. And they do that through grants. And so we've been working with the Office of Director and they run their grant management applications in ERA on AWS and that allows it to scale up and work very efficiently. And then we entered in with NIH into this program called STRIDES. STRIDES is a program for not only NIH but also all these other institutions that work within NIH to use the power of the cloud, use commercial cloud for scientific discovery. And we started that back in July of 2018, long before COVID happened. It was so great that we had that up and running because now we're able to help them out through the STRIDES program. Right, can you imagine if, let's not even go there, I was gonna say. But so okay, so the SRA data is available through the AWS Open Data Sponsorship Program. We talked about STRIDES, what are some of the other ways that AWS is supporting STRIDES? Yeah, no, so STRIDES is a wide ranging through multiple different institutes. So for example, over at the National Heart, Lung and Blood Institute, NHLBI, I said there's a lot of acronyms, NHLBI, they've been working on harmonizing genomic data. And so working with the University of Michigan, they've been analyzing through a program that they call TopMed. We've also been working with NIH on establishing best practices, making sure everything's secure. So we've been providing AWS professional services that are showing them how to do this. So one portion of STRIDES is getting the right dataset and the right compute and the right tools in the hands of the scientists. The other areas that we've been working on is making sure the scientists know how to use it. And so we've been developing these cloud learning pathways and we started this quite a while back and it's been so helpful here during the COVID. So scientists can now go on and they can do self-paced online courses, which have been really helpful here during the pandemic, and they can learn how to maximize their use of cloud technologies through these pathways that we've developed for them. Well, that education is imperative. I mean, think about all of the knowledge that they have within their scientific discipline being able to leverage technology in a way that's easy is absolutely imperative to the timing. So, let's talk about other datasets that are available. So you've got the SRAs available. What other datasets are available through this program? Yeah, we have a long wide range of datasets that we're doing open datasets. And in general, these datasets are improving the human condition or improving the world in which we live in. And so I talked about a few things. There's a few more things. So for example, there's the Cancer Genomic Atlas that we've been working with National Cancer Institute as well as the National Human Genomic Research Institute. And that's a very important dataset that's being computed throughout the world commonly within the scientific community that dataset is called CCGA. Then we also have some datasets that are focused on certain groups. So for example, Kids First is a dataset that's looking at a lot of the challenges and diseases that kids get. Every great thing from very rare pediatric cancers to heart defects, et cetera. And so we're working with them. But it's not just in the medical side. We have open datasets with, for example, NOAA, National Oceanic and Atmospheric Administration to understand what's happening better with climate change and to slow the rate of climate change within the Department of Interior. They have that Landsat database that is looking at pictures of the Earth, satellite pictures of the Earth so we can better understand the world we live in. Similarly, NASA has a lot of data that we put out there. And over in the Department of Energy, there's datasets that we're researching against or that the scientists are researching against to make sure that we have better clean, renewable energy sources. But it's not just government agencies that we work with. When we find a dataset that's important, we also work with non-profit organizations. Non-profit organizations are also, they're not flushed with cash and they're trying to make every dollar work. And so we work with them. Organizations like the Child Mind Institute or the Allen Institute for Brain Science. And these are largely like neuroimaging data and we've made that available via our open dataset program. So there's a wide range of things that we're doing and what's great about it is when we do it, you democratize science and you allow many, many more scientists to work on these problems that are so critical for us. The availability is incredible but also the breadth and depth of what you just spoke, it's not just government, for example, got about 30 seconds left. So I'm going to ask you to summarize some of the announcements that you think are really, really critical for federal customers to be paying attention to from re-invent 2020. Yeah, so one of the things that these federal government customers have been coming to us on is they've had to have new ways to communicate with their customer, with the public. And so we have a product that we've had for a while called AWS Connect and it's been used very extensively throughout government customers and it's used in industry too. We've had a number of announcements this week. Andy Jass made multiple announcements on enhancements to AWS Connect or additional services. Everything from helping to verify that that's the right person from AWS Connect ID to making sure that that customer gets a good customer experience to connect wisdom or making sure that the managers of these call centers can manage the call centers better. And so I'm really excited that we're putting in the hands of both government and industry a cloud based solution to make their connections to the public better. It's all about connections these days. Brett, I wish we had more time because I know we can impact so much more with you but thank you for joining me on the queue today, sharing some of the insights, some of the impacts and availability that AWS is enabling the scientific and other federal communities. It's incredibly important and we appreciate your time. Thank you, Lisa. For Brett McMillan, I'm Lisa Martin. You're watching theCUBE's coverage of AWS re-invent 2020.