 Okay, welcome back everyone. LiveCube coverage here in Las Vegas for Amazon Remars. Hot event, Machine Learning Automation Robotics in Space. Two days of live coverage. We're talking to all the hot technologists. We've got all the action startups and segment on sustainability. Ana Pinheiro, Prevet, Global Lead, Amazon Sustainability Data Initiative. Thanks for coming on theCUBE. Did I get that right? Can you- You did? Absolutely. Okay, great. Thank you. The analyst mixer and blown away by the story going on at Amazon around sustainability data initiative because we were joking. Everything's a data problem now because that's cliche. But in this case, you're using data in your program and it's really kind of got a bigger picture. Take a minute to explain what your project is, scope of it on the sustainability. Yeah, absolutely. And thank you for the opportunity to be here. Okay, so I lead this program that we launched several years back in 2018, more specifically, and it's a tech for good program. And when I say the tech for good, what that means is that we're trying to bring our technology and our infrastructure and lend that to the world, specifically to solve the problems related to sustainability. And as you said, sustainability inherently needs data. We need data to understand the baseline of where we are and also to understand the progress that we're making towards our goals, right? But one of the big challenges is that the data that we need is spread everywhere. Some of it is too large for most people to be able to access and analyze. And so what we're trying to tackle is really the data problem in the sustainability space. What we do more specifically is focus on democratizing access to data. So we work with a broader community and we try to understand what are those foundational data sets that most people need to use in the space to solve problems like climate change or food security or think about sustainable development goals, right? Like all the broad space. And we basically then work with the data providers, bring the data to the cloud, make it free and open to everybody in the world. I don't know how deep you want me to go into it. There's many other layers into that as well. So the perspective is zooming out, you're looking at creating a system where the democratizing data means making it free, and it's really available so that practitioners or citizens, data wrangler, people interested in helping the world, could get access to it and then maybe collaborate with people around the world. Is that right? Absolutely. So one of the advantages of using the cloud for this kind of effort is that, cloud is virtually accessible from anywhere where you have internet or bandwidth, right? So when you put data in the cloud in a centralized place next to compute, it really removes the need for everybody to have their own copy, right? And to bring it into the traditional way is that you bring the data next to your compute. And so we have this multiple copies of data, some of them are on the petabyte scale. There's obviously the carbon footprint associated with the storage, but there's also the complexity that not everybody is able to actually analyze and have that kind of storage. So by putting it in the cloud, now anyone in the world, independent of their computer capabilities, can have access to the same type of data to solve the problems. I remember doing a report on this in 2018 or 2017. I forget what year it was, but it was around public sector where there was a movement with universities and academia where they were doing some really deep compute where Amazon had big customers. And there was a movement towards a open commons of data, almost like a national data set, like a national park kind of vibe. That seems to be getting momentum. In fact, this kind of sounds like what you're doing, something similar where it's open to everybody. It's kind of like open source meets data. Exactly, and the truth is that these data, the majority of it, and we primarily work with what we call authoritative data providers. So think of like NASA, NOAA, UK Met Office, organizations whose mission is to create the data. So their mandate is actually to make the data public. But in practice, that's not really the case, right? A lot of the data is stored in servers or tapes or not accessible. So yes, you bring the data to the cloud and in this model that we use, Amazon never actually touches the data and that's very intentional so that we preserve the integrity of the data. The data provider owns the data in the cloud, we cover all the costs, but they commit to making it public and free to anybody. And obviously the compute is next to it. So that's a value added. Anna, so give me some examples of some successes you've had, some of the challenges and opportunities you've overcome. Take me through some of the activities because this is really needed, right? I mean, sustainability is top line conversation. Even here at the conference, Rimar is talking about saving climate change from space, which is legitimate and they're talking about all these new things. So it's only going to get bigger, this data. What are some of the things you're working on right now that you could share? Yeah, so for me, honestly, the most exciting part of all of this is when I see the impact that it's creating on customers and the community in general. And those are the stories that really bring it home, the value of opening access to data. And I would just say the program actually offers, in addition to the data, access to free computes, which is very important as well, right? You put that in the cloud, it's great, but then if you want to analyze it, there's the cost and we want to offset that. So we have basically an open call for proposals that anybody can apply and we subsidize that. So what we see by putting the data in the cloud, making it free and putting the compute accessible is that we see a lot, for instance, startups. Startups jump on it very easily because they're very nimble. They, we basically remove all the costs of investing in acquisition and storage of the data. The data is connected directly to the source and they don't have to do anything. So they easily build their applications on top of it and workloads and turn it on and off, as you know. So they don't have to pay for it? They have to pay, they basically just pay for the computes whenever they need it, right? So all the data is covered. So that makes it very feasible for a lot of startups. And then we see anything like from academia and nonprofits and governments working extensively on the data. What are some of the coolest things you've seen come out of the woodwork in terms of things that built on top of the data? The builders out there are creative. All that heavy lifting's gone. They're being creative. I'm sure there's been some surprises or obvious verticals that healthcare jumps out at me. I'm not sure if FinTech has a lot of data in there, but it's healthcare I can see, a big air vertical. I'll see, you know, oil and gas, probably concerned. Yeah, so we see it all over the space, honestly. But for instance, one of the things that is very common for people to use this NOAA data, like weather data, because basically weather impacts almost anything we do, right? So you have this forecast of data coming into the cloud directly streamed from NOAA. And a lot of applications are built on top of that, like forecasting radiation, for instance, for the solar industry or helping with navigation. But I would say some of the stories I love to mention because are very impactful are when we take data to remote places that traditionally did not have access to any data. And for instance, we collaborate with a program, a nonprofit called Digital Earth Africa, where this is a basically philanthropically supported program to bring Earth observations to the African continent and making it available to communities and governments. And things like illegal mining, fighting illegal mining on the forest station, for mangroves too, deep forest. It's really amazing what they are doing and how they are managing their resources. And the low cost nature of it makes it a great use case there. Yes, yeah. So it makes it feasible for them to actually do this work. Yeah. You mentioned the NOAA data, made me think of the Sail Drone, my favorite use case. Yes. Those Sail Drones that go around, meaning to them twice on the cube at re-invent over the years. Really good innovation. That vibe is here too. At the show at ReMars this week, at the robotics showcase, you have startups and growing companies in the ML, AI areas. And you have the convergence of, not obvious to many, but here, this culture is like, hey, it's all coming together. Physical industrial space is a function of the new IoT landscape. I mean, there's no edge in space, as they say, right? So it's unlimited edge. So this kind of points to the major trend. It's not stopping this innovation, but sustainability has limits on earth. We have issues. We do have issues. And I think that's one of my hopes is that when we come to the table with the resources and the skills we have and others do as well, we try to remove some of these big barriers that make things harder for us to move forward as fast as we need to, right? We don't have time to spend that, you know, I've been accounted that 80% of the effort to generate new knowledge is spent on finding the data you need and cleaning it. We don't have time for that, right? So can we remove that undifferentiated heavy lifting and allow people to start at a different place and generate knowledge and insights faster? That's key, that's the key point. Having them innovate on top of it. What are some things that you want to see happen over the next year or two? As you look out, hopes, dreams, KPIs, performance metrics, what are you driving to? What's your North Star? What are some of those milestones? Yeah, so some, we are investing heavily in some areas. We support, you know, we support broadly sustainability, which as you know, it's like, it's all over the space, but there's an area that is becoming more and more critical with climate risk. Climate risk, you know, for obvious reasons we're experienced, but also there's more regulatory pressures on business and companies in general to disclose their risks, not only the physical, but also the transition risks. And that's a very data heavy and compute heavy space, right? And so we are very focusing in trying to bring the right data and the right services to support that kind of activity. What kind of breakthroughs are you looking for? So I think again, it goes back to this concept that there's all that effort that needs to be done equally by so many people that we're all repeating the effort. So I'll put a plug here, actually, for a project we are supporting, which is called OS Climate. I don't know if you're familiar with it, but it's the Linux Foundation effort to create an open source platform for climate risk. And so they bought the S&P Global, Airbus, Laliance, all these big companies together, and we are one of the funding partners to basically do that baseline work, what are the data that is needed, what are the basic tools, let's put it there and do the pre-competitive work so then you can build the competitive part on top of it. It's kind of like a data clean room. It kind of is, right? But we need to do those things, right? Are they worried about competitive data or is it more anonymized out? How do you? It has both, actually. So we are primarily contributing with the open data part, but there's a lot of proprietary data that needs to be behind the wall. So yeah, absolutely. You're in the cutting edge of data engineering because web and ad tech technologies used to be where all that data sharing was done for the commercial reasons. You know, the best minds in our industry, quoted by a CUBE alumni, are working on how to place ads better. Jeff Hammerbacker, founder of Cloudera, said that on the CUBE. And he was like, embarrassed. But the best minds are working on how to make ads get more efficient. But that tech is coming to problem solving. And you're dealing with data exchange, data analysis from different sources, third parties. This is a hard problem. Well, it is a hard problem and my perspective is that the hardest problem with sustainability is that it goes across all kinds of domains. We traditionally have been very comfortable working in our little swimming lanes where we don't need to deal with interoperability and extracting knowledge. But sustainability, you touch the economic side, it touches the social or the environmental, it's all connected. And you cannot just work in the little space and then go and test the impact in the other one. So it's going to force us to work in a different way. It's big data, complex data from different domains and we need to somehow make sense of all of it. And there's the potential of AI and ML and things like that that can really help us, right? To go beyond the modeling approaches we've been done so far. And trust is a huge factor in all this, trust. Absolutely. And just going back to what I said before, that's one of the main reasons why when we bring data to the cloud, we don't touch it. We want to make sure that anybody can trust that the data is NOAA data or NASA data, but not Amazon data. Yes. Like we always say in the CUBE, you should own your data plane. Don't give it up. Well, that's cool. Great to hear the update. Is there any other projects that you're working on that you think might be cool for people that are watching that you want to plug or point out because this is an area people are leaning into and learning more younger talents coming in? I think whether it's university students to people on side hustles want to play with data. So we have plenty of data. So we have over 100 data sets, petabytes and petabytes of data, all free, you don't even need an AWS account to access the data and take it out if you want to. But I would say a few things that are exciting that are happening at ReMars. One is that we are actually got integrated into ADX, so the AWS data exchange. And what that means is that now you can find the open data, free data from ASTI in the same searching capability and service as the paid data, right, licensed data. So hopefully it will make it easier. If you want to play with data, we have actually something great. We just announced the hackathon this week in partnership with UNESCO. Focus on sustainable development goals, a hundred K in prices and so much data for you two years. They got the world as your oyster. Go check that out at URL website. I'll see you at Amazon and use our website or the project they can join or how do people get in touch with you? Yeah, so Amazon SDI, like for Amazon Sustainability Data Initiative, so amazonsdi.com and you'll find all the data, a lot of examples of customer stories that are using the data for impactful solutions and much more. And there's a new kind of hustle going out there. I've seen entrepreneurs do this and very successfully, they pick a narrow domain and they own it. Something really obscure that could be off the big players reservation. And they just become fluent in the data and it's a big white space for them. There's market opportunities and at the minimum you're playing with data. So this is becoming kind of like a long tail, domain expertise data opportunity. It seems to be really hot. So yeah, go play around with the data, check it out for a good cause too. And it's free. It's all free. Almost free. It's not always free. Is it always free? Well, if you're, a friend of mine said, it's only free if your time is worth nothing. Yeah, exactly. Well, Anna, great to have you on theCUBE. Thanks for sharing the story. Sustainability is super important. Thanks for coming on. Thank you for the opportunity. Okay, CUBE coverage here in Las Vegas, I'm John Furrier. We'll be back with more day one after this short break.