 Welcome back everyone to theCUBE's live coverage here at theCUBE. Open Source 2023 in Vancouver. I'm John Furrier with my co-host Rob Strecce. We're bringing out all the action and the theme today is getting into the sustainability. We talked about energy. Open Source is bringing the value of software with the new infrastructure transitions to help the world be a better place and to save the world. Climate change is huge. I've got a great guest here, Eric Erlinson who's the data science team lead emerging technologist at Red Hat. Red Hat doing its part. Eric, great to see you. Thanks for being on theCUBE. Thanks a lot for having me. This is an area that is exploding just from a political standpoint and from a human global standpoint, climate change. Technology behind it is going to be the key to success. Open Source software is now the industry, software industry, so open source, climate change. What's your role? What do you do at Red Hat? What's the initiatives that you're working on? Well, as the data science team lead at Emerging Tech, my role is to understand strategically important technologies for Red Hat, but also because we're an open solutions company, the open communities that surround all these ecosystems. And specifically in terms of sustainability, I've been trying to assist them with building out a open data mesh platform to serve all of the actual data federation needed to give us the right answers in terms of aligning all of our economic activity with the environmental impacts. I look at it as the operating system of climate change. Because it's nerdy. There's a lot of technology. Again, we've been riffing on the whole tech aspect of it, but there's a URL, os-climate.org, as you can check that out if you're watching. Os, like operating system, dash climate.org. Data is a big part of this. This is a data opportunity to get it in and get the right data and use scale and AI, for instance, to solve the problem. But first you got to set it all up. You have the infrastructure and the architecture to do that. Take us through the challenges and the opportunities. Sure, so like the first challenge is that the actual regulatory climate is evolving fast, every year, almost every quarter, there's increased requirements for companies to report their impacts. So in order to do that, you have to be able to align financial institution investments with the results of these reports. So imagine you want to evaluate the physical risk to all the companies in your investment portfolio. And to do this, you have to understand the actual locations of the assets of each company, where are their factories, their warehouses. Now imagine like you have a warehouse next to a river. Now, if our climate modeling suggests that this river is going to begin flooding more frequently, that is basically a physical risk to the value of your portfolio. But imagine now, just that sentence I said, what are all the different kinds of data sources you need to create that one simple answer among many? You have to understand lists of actual physical assets. That's a huge piece of data right there. You have to understand their lat-long locations, and then you have to actually align that with the outputs of complex climate models. And so there's like three or four entirely different data sets you need to bring together in one place to provide these kinds of answers. And they're all coming from different places. So in order to meet this kind of federated data need, we're leveraging the latest like data mesh architectures because data meshes are by their very nature federated in terms of like governance and data ownership, and frankly physical location. Does it matter which data mesh that you're using or are you leveraging like Presto or Trino or all above or allowing people to kind of bring their own? That's a great question. Obviously in order to do this, we're making specific technology choices. In our case we are using Trino because it is actually designed to provide this kind of federation. But because it is being designed as an open architecture, people are free to make their own individual technology choices. In examples, take Trino. In many ways Trino behaves similarly to something like Spark SQL or Spark structured streaming. And so I can imagine if somebody wishes to deploy their own data mesh in a space they might choose to use Spark SQL. They could use obviously Presto. In fact, in the talk I gave yesterday I devoted a slide to talking about all the possible open substitutions that you can make in the space. And I think one of the interesting things that comes in and I think you brought it up is getting the data, bringing all the data together. And I think one of the things that I've seen and really I guess eats at me and is that people greenwashing things and they come out with their carbon footprint tool and say, hey, we're going to tell you how much carbon you're really utilizing when you're on our infrastructure or something like that. How is what you guys are kind of doing helping to kind of bring openness and transparency to that? There's a couple different ways. We are architecting the system to use data as code principles. And so all the different pipelines that different data owners and providers use to bring to our system will be encoded using things like a layer of pipelines or DBT pipelines running on air flows and other architecture we're exploring. And so people from the community can both review our open source code for correctness. But then when it comes to what you talked about the individual reported pieces of data, they can also challenge that or correct it. It's like if somebody says, hey, look, you know this number is wrong. The people who provide it can correct it or people can maybe with the right tooling offer corrections. Yeah, I think that's the interesting part is that again, us coming into this and seeing it from the outside in and looking at it and going, okay, this is super important for people to understand not just the consumption part but the supply part and understand both and where is the power coming from, accounting for all of this? Is it carbon neutral, non-carbon neutral? And we were just talking to the LF energy folks just before this and having some discussions with them. How does this tie into, I saw that you guys were at COP 27 in Egypt or at least the Linux Foundation was, how does this tie into the, I would assume SDG7 initiatives? Yeah, well, we're working closely with a lot of these regulatory bodies. Obviously, many of them are interested in what we're doing because we're offering a possible open community around allowing the entire financial world to use our data to get better answers to these questions. I think one example of how this plays is, as you know, there's a lot of initiatives that give you estimates for how much your compute workloads are actually emitting in terms of carbon and the first versions of these really just gave you global averages and so in a sense they were much better than nothing but they were also very crude. Now the problem is if you want to go deeper there's almost a step function of complexity. If you want to take the next step, all of a sudden now you have to know the physical locations where your workloads are running, you have to understand where those data centers are getting their power from. You have to understand what from hour to hour the actual power mix is. It's like how much is coming from coal, how much is coming from wind or solar and imagine all that data, it's like on a per hourly basis at least, you have physical locations, workload locations and if they migrate it changes and then mapping it into getting from the actual power providers like what their power mix is. So it's a huge data federation. I like the portfolio alignment tool as an incentive for the company to get value out of thinking about how to energy. I went to the OS website which stands for open source. I think operating system, I'm an OS guy. Because we need an OS for climate change. It should be like, maybe Red Hat, they could sell OSes. So on the website you have three pictures, the vision and goal which is a beautiful earth. Problem is a bunch of smoke stacks with smoke coming out of it and the solution is like windmills. Okay, I get that's factory, pollution, sustainability. What's the picture for technology? Because in our world we get data centers, crypto's been criticized for energy consumption, more GPUs are pumping and mining Bitcoin, huge energy carbon impact of crypto. Now we got AI, these large language models, like open AI, I hear that they're kind of cranking a lot of carbon. What's your view on that? Can you share just your nerdy geeky side of it? Because I geek out on that, I wonder what the pollution is going to come in. I've been saying that AI is going to create more code pollution because it's auto-coding a lot of stuff, a lot of misinformation. So in a way that's content pollution but there's also carbon impact with AI. What's your geek out view on some of the crypto and then now AI is similar, is crypto? I love your question. I think you have to take like the sort of long technological arc view, like the research, the way you make progress in the large language model space for research is you increase the size of the models. For instance, GPT-4 is estimated to have on the order of a trillion parameters and runs to operate it at all requires absolutely the largest, most expensive GPU workstations and blades available. And I have not seen the carbon impact reports from those but I expect they're quite large and like you said, I think there's probably a useful potential comparison with crypto except these are all above board. But you think you agree that's probably pretty heavy? It's pretty heavy. Now having said that, just in the last two months there's been an absolute explosion of progress with open source models and research papers where what they're doing is figuring out better algorithms to get similar results from smaller models. And so you're going to actually see people obtaining the same kind of intelligence from these smaller models that you can get from something like GPT-4. People will continue to use tools like GPT-4 to explore the frontiers of what's possible with these neural networks. And I think there's probably an interplay here like in companies that are using these models have to report their carbon impact. And so, you know. This could be an impact to open source. Because open source, the smaller models that have just released that have actually energized the open source community. Open AI is the large language model for the, they want to rule the whole world. But now you're starting to see a lot more open source quick solutions hitting the market. Yes, and you know, open source does this to all technologies. It tends to democratize them and commoditize them. You know, it has the effect of when it's successful undercutting large proprietary vendors. And I'm aware that these vendors are also aware of this. But I think, you know, if you're a company like Open AI, your requirement to report your carbon impact will actually incentivize you to also apply these latest results to your own operations and produce, you know, the same model results of smaller models that are outputting less carbon. It's interesting. People are complaining about the effects of AI when they should be focusing their energy on the carbon footprint, Rob. Right, yeah, and I think it's interesting because I think with AI and where it's going, I think from an open source perspective, do you see them contributing back into this, into OS climate and how is AI really helping or is it not right now with what you're working on? There is a role for AI. An awful lot of what we do is, I would say, more traditional, you know, data engineering and ETL. There's a huge role. However, a lot of companies, we talked about reporting earlier, companies submit their reports in unstructured formats. There is no standard thing a company can fill out. And so imagine you have tens of thousands of PDFs every year filed by companies from all over the world reporting all of their impacts, but every single one of them is using their own custom reporting format. So enter large language models such as chat GPT. For the last two years or more, we've been experimenting with what I'll call pre-chatbot class AIs to take these reports and find the answers. It's like you can literally, as you know, you can literally pose questions to these things and you literally have a list of questions and say, how much carbon is this report claiming? And it will go through the document and try to give you an answer. And so what's nice is the increased capability of these latest large language models promises to be a huge asset for taking all that unstructured, hard to work with data and actually compiling it into a format that's usable for computing answers to our questions. And that solves a lot of problems around this undifferentiated heavy lifting. The old way was merge all the reports. Someone has to do that grunt work. I mean, it's like, I mean, who wants to do that? That's the whole point of it. You almost couldn't afford to. Only the very largest financial institutions could actually afford to like do the ETL on this data. And that left all the rest of the entire financial universe basically unable to field their own solutions. And so OS climate's goal is to provide a truly global community resource so all financial institutions can make all their financial activity in their investment portfolios climate aware. So basically the bottom line is, is that the entire climate change initiative is a data problem. That's right. At the end of the day, you got to have the data and you got to know how to apply it. That's where AI could come in and help with things like getting those reports, maybe moving the needle a little bit. What's your take on AI writing code? Cause I mentioned code pollution before. I introduced that concept at KubeCon in a way almost tongue in cheek, but it's actually got tracks where you're like, you're right, it's like bad code. It's like pollution. That is true. So it's going to, you know, code to watch code. I think that'll get solved. But I think ultimately, what is the downside of AI? What's the upside and downside of all this new open source? Yeah, well, I think, you know, the upsides are in some sense familiar because every day we read the reports of the amazing things. People are prototyping with these models. You know, when you watch them doing the right thing, as you all know, it is really, really, it's truly like watching what we used to see in science fiction, interacting with these. And I think part of the downside is that a lot of people watch them in action and don't realize that it's really nothing but a, you know, very fancy next word predictor. And it, because they operate like this, of course, you know, they will with equal confidence give you something that's correct but also just make answers up there incorrect. And as you say, there is tons of progress. I mean, any statement I make about the error rates basically becomes untrue within like a matter of months. It is magic. It's truly next gen material. This is like definitely next gen. And it's early. I totally bullish on AI. I think it's going to be the kind of tornado that's going to, you know, topple unstable old ways. You're not set up for this wave or this tornado. You're going to get wiped away. I truly hate to use the word disruptive, but I think in a real sense, you know, when it all shakes out, there's going to be disruptions. It's going to change, you know, the way we humans create our own content in the same way that like handheld calculators change the way we used to do arithmetic. Yeah. I like the word disruptive enabler because it's an opportunity to enable new things. I mean, like your example, the reports. That's a good example. I do know, I know given the current state of the art, the way I like to frame this whole issue is that they are very capable of acting as an assistant. But humans, because they're capable of making mistakes, you know, the humans operating them need to interact with them as like a manager or an editor, taking their content, reviewing it for correctness and either fixing it yourself or these days you can of course ask the models to fix their own outputs in some cases. But I think that's the way I see it for the, you know, near to midterm for sure. Eric, thank you for coming on the queue. We had a little long there, a little AI side track there. Congratulations on the work that you're doing and kudos to Red Hat for stepping up and being part of this project. OSclimate.org. I'm sure OS loves the OS being operating system because they are, you know, Linux is Linux and but open source dash OSclimate.org is the website. A lot of great tools on there, got a great project. Check it out. It's the queue bringing you all the action here from Vancouver. We'll be right back with more coverage after the short break.