 Welcome to Amsterdam and KubeCon, CloudNativeCon 2023. Join John Furrier, Savannah Peterson, Rob Streche, and UPSCOT as the Kube covers the largest conference on Kubernetes, CloudNative, and open source technologies together with developers, engineers, and IT leaders from around the globe. Live coverage of KubeCon, CloudNativeCon 2023 is made possible by the support of Red Hat, the CNCF, and its ecosystem partners. Welcome back, everyone. It's the Kube's live coverage of KubeCon and CloudNativeCon Europe. It's like open source TV here. I'm John Furrier, host of the Kube. We got Ray Streche here with the KubeAnalyze. Of course, the Kube alumni back again, Richard Hartman with Director of Community Grafana Labs, a legend. We're just talking about open source TV. I feel like whenever you come on, we get an update on open source. Great to see you again. Thanks for coming on. Thanks for having me. So we're just talking before we came on camera, the dynamics of open source projects. Yesterday, we were talking about AI. You had some great insights around the impact of licensing. You've done some research on this. A lot of good stuff's happening. New numbers are coming out. 60% new users here at KubeCon. New blood coming in. Part of this is doing great. How are you doing? Busy, but good. It's a blast. Okay, let's go first into the, I want to get to the whole open source ecosystem and, you know, some of the dynamics there. I think really an important story we're seeing, and it's kind of related to open source. It's kind of when the fringe we're bringing it in for conversation to start looking at the impact of AI. Obviously, Stack Overflow is just ban chat GPT because they don't want to have flooded with auto answers just seeing a lot of licensing issues. You've been researching this and there are some implications around AI, auto coding. We were calling it code pollution. That's my word, a little bit over the top, but, you know, it makes the point, you know, if you're going to flood code and it's not good, it's only going to clog things up. I agree in the pollution aspect, at least for the potential for pollution. The thing is, I'm not a lawyer, but I like reading legal texts for weird reasons. At least under US law, as we currently, our best guess or as the current state of the law is you cannot copyright automatically generated things or the written word which is automatically generated. So you can't really put a lot of protection on this, but what can absolutely happen is that something which has been generated automatically can be under copyright. Of course, those large language models which are underlying those code generation, they are actually based on code. And, like, take an Apache 2 license project and you feed something which comes from GPL or AGPL in there, those are not actually compatible licenses. Or, I mean, they are, but not for the intense and purposes of the people who want to keep using Apache 2. And that is a risk. Because if you slide something in knowingly or unknowingly and then someone comes around and is like, okay, this is actually licensed in a different license, you have a problem. And even if it's just Apache 2 which is being copied from, you still might have requirements on naming the original authors, retaining copyright headers, things like these. And as of today, there is nothing which actually does all of this and actually checks all of this and gives you, you know, my references, where I learned from and where I pulled from as such, put them in appendix or whatever. And also licenses are not made for this because I don't want to have a header which is this large. So maybe at some point we come to a system where, I don't know, I have a reference file and from that reference file I pull in where this was coming from so I still give some sort of attribution and reflect this in the licensing requirements instead of having a header, again, of this size. And Rob, you know, we've been, you know, we've seen the ways of innovation and every single time there's an inflection point there's always the licensing dynamic. I'll go back, I'm kind of old now so we're kind of been around the block a few times but I remember when, you know, you go back to the late 90s, early 2000s when open source was booming if you were a founder, you kind of didn't know what dual source and Apache and GPL was and you had those rules or the road for open source but you build something and then someone wants to say buy the company and pay all the lawyers. It's like, wait a minute, you use code from the MIT license in here and it would unwind innovation. Here we have this new dynamic. And I think I had the same worries about copyright and what happens when copyrighted. I was recently at a company where we have our Apache 2 and then we created a community license. So if that community licensed software which was out there and put it up in chat GDP could impact somebody who was using that differently from our Apache 2 and I think that brings up what we were talking about which is that the humans may not be generating all the code but they're going to have to be smart enough to actually go through the code and understand where did it come from and I think do you see that I like the whole reference file type of concept and I think that's completely missing from all the gender and the AI that's out there now. Do you see that more being private versions like within Grafana starts to use your version of an LLM or something like that to help you or how do you see that working? That's actually something which which is something which is being discussed I honestly I don't think we're the only company having those kind of thoughts where we say okay we know where we have complete permission to do everything with the code and as such we can be training on it but that is also a problem because when you look at large language models and like all of those players make huge noise about how they how they make what was it the the smaller data sets which you actually use to find tuning data sets they make those public and they make huge noise about oh it's open source and everything which is not the truth. If you talk to the experts in the field the actual thing which matters is the much larger body of work on which the initial training was done and this is also part of or this is actually where most of the substance is coming from with the find tunings where basically the errors are being weeded out and optimizations are made so you have less wrong answers and everything so you need a substantial amount of data to train on but that is something which is non trivial unless you're like one of the hyperscalers you don't have this amount of code. And just for the people watching that the S-bomb is being discussed in certain supply chain how do you figure out what's in that base it's so massive it has to be copyright in there and then it's in the public domain at this point so what you're saying is that the generating code is essentially public domain by default or not or not licensed. No so public domain would be I can do whatever I want and that is the inverse I don't know what I can even do with it that's right that's much much worse because I don't know this is my point about back in the day when you had to go to the lawyer saying okay you got to rewrite code because you can't use that if you're going to sell the company or do a licensing deal let's talk about case law so right now there's really no precedent what are your opinion about that there will be case law who's going to be first I have a bet but I'll share it over a beer yeah well we've heard rumblings in the hallway here that people here that are exhibiting have gone on and put stuff into chat gpt we know the Samsung example we've been talking about on the cube where they actually had proprietary code go out into chat gpt we've heard customer lists people writing memos internal things hey rewrite my my internal all hands meeting I mean that's out there now this is the danger it very much is and even if I don't have to write as per corporate agreement licensing whatever to put this in there it's still in there and at least as per the licensing agreement with for example chat gpt they can use all of this to train their model so they literally have the right to go through all of this data all of this sometimes very very valuable data in theory and just get stuff out which they want and in the end like if I do this I'm the person in the middle because I didn't have the right to to do this but I still gave someone else that data under a license agreement or under usage agreement where I gave them permission to do this so anything happens I said in the middle and I get basically I'm the one who gets it from both sides let me ask you guys both the question because I think this is a great conversation we're riffing in real time here about an unknown future well you can draw on the past maybe here in open source because what we're really talking about here is a whole another level of new open source not yet known license backlash or blowback so if things are going to become more open with data what might happen what do you think the scenario is does it get shut down, do people go to their corners become more fragmented because the scenarios are interesting here there's a lot of what could happen based on what that next bit flips so if we go okay let's see who gets sued first and set present law there'll be some breakage but move fast progress or do we just stop everything what do you guys see as a potential future Rich we'll start with you so I think the cat is out of the bag there's no way to put this genie bag in it's around to stay and we're also going to see a lot of really really good usage but we're also going to see problems I like the concept of co-pollution because if you want to build a detector for any any radiation today and you need a steel enclosure the only way to get steel which is not contaminated by above ground atomic tests is to go back to ships which have sunk before we started using atomic bombs well using is maybe the wrong term but you get my meaning no pure steel because you need to blow so much air into your iron during the steel cooking process you just contaminate with the residues of what's still in the air so the only way again is to send divers cut up old ships and then get that steel out and we have the same inflection point for humanity's library of Alexandria on the internet there anything after middle of last year there is some some taint to it we cannot be as certain that this has not just been fully automatically generated and is void of any actual substance or any human comprehension just basically a stupastic parrot in back stuff which someone else wrote so they strip mine chat GPT open air they strip mine the web I mean essentially but that I would be surprised if you're charging for it that's an ethical the web was free I or free I'm saying that's my hyperbole that's my hyperbole but it's all seriously they're getting the data reformatting it so I sense the blowing air into it at least a part of it where they've trained their model and I think that's the I think what I'm very interested in is I think that the technology has so much promise for certain things I think you know where Grafana plays in particular in anomaly detection and looking at an analytics and looking for things inside there that some of the analysts can't see or can't think to see I also think I'm a big fan of the prompt being able to ask questions very you know search oriented but more natural language and I think it has advantages bringing newer people into certain parts of the industry like security as well but I think it's going to be private data that those models are trained on so because I also think that people's applications and their deployments of things like Kubernetes and the apps that they build on top are kind of mixed onto themselves in many cases yes but also there's a very very or there's actually two very different or very important points here so first generally speaking all of those models whatever they are they tend to be relatively good at giving you a bouquet of possible answers and then you as a human choose like when you look at stable diffusion something you get a lot of pictures which are worthless and you get a few which are kind of nice and you can iterate on this and you can prove on this and at the end you select one of them that's very much different from so I have a very white valid answer space for my question if I ask write me code which determines from those x-rays if that person has cancer there is a very very constrained space for correct answers and people are completely conflating the tool so when you say explain this and I think about it that is a super valid use of this technology when you say write me that thing that's maybe not the best and we had someone who interviewed at Grafana and they were using all of the AI for the coding interview and we let them cause like bring the tools you can use and and they tied themselves into a knot and they didn't make what was the code like I wasn't part of the thing I just heard the story it didn't go too well well this comes back down to what I call the crutch factor if you're leaning on AI too much and not augmenting the brain it's going to cause a lot of people younger coders this is back to my then not being aware of the new licensing dynamics just like in our generation we were kind of feeling our way through the new licenses that were more derivative works friendly some contribute back to the community that were happening just to keep the rising tide going so we're now at that new inflection point how do we manage it what's the use case what's a possible scenario for an outcome that could be good I mean there are lots of things which are good like for example I've seen things where people who maybe don't have the best expression but still need to write business letters or complaints to authorities or something can put basically the basics of what they want to achieve they get a nicely structured letter which basically passes the sniff test of a professional that this is also a professional person because this is some form of gatekeeping depending on your background and everything maybe you're just not as well versed in all of this and you're being gatekeeping based on this so assistive technologies or automated captioning or image descriptions on social media and such where a lot of people don't take the time to actually un-differentiated heavy lifting or augmentation and support of the humans not replacement of the humans those are really great examples and really great applications of this technology awesome well great we beat that dead horse there AI is going to, we don't know it's like a weather storm coming in you don't know what's going to happen until it happens but we do know data will be involved open issues like open source issues will be there licensing issues, productivity issues and potentially backlash on code pollution security it's going to be crazy final point here, I want to get in close with a real quick KubeCon update Richard, from your perspective how do we feel about KubeCon this year 60% new blood coming in, first timers roughly 20% two or three years in and folks like us like 15% make up the rest what's your summary of KubeCon this year what's your assessment so initially this was planned for or anticipated for roughly 5k people and we are 10,500 so more than double in app capacity which which is I think a little bit the theme of this KubeCon we weren't certain if this would be a good KubeCon if people would be coming back or not it really was very uncertain to be honest and in particular in the last few months there was an absolute flood of people who wanted in so I think we are basically back in a steady state we're back to a steady state I'm growing, just for clarification I know it was announced on the keynote that they said that this was the largest open source project I know you're just laughing but I want to clarify the record and CNCF has also acknowledged this on Twitter but Fostam23 in Brussels had more is the largest we don't know about 23 so we don't have tickets or anything for Fostam so we don't have hard numbers but we can count people how many are in the room we can see how many MAC addresses do we have before there were privacy extensions which kind of dates this information for anyone who is maybe in the networking scene like how many users that the access controllers see on their access points all of those things and then basically correlates with how full were the rooms how many people were on the hallway and that's where we have this number of roughly 12k from that's a few years old and you feel comfortable that's an accurate number I am pretty confident, yes so for the record, Fostam is the largest open source conference in Europe but I also expect to be overtaken by Chris in Chris Anicek or if I wear the governing board hat in 2023 in Paris it's a rising tide, it's good for everybody it's all tongue and cheek inside open source baseball and so it's all fun to talk about honestly both Chris and I are a little bit OCD and when he was on the station like nope well congratulations great to have you on your legend, you got the data and I love your perspective, I love how we have this open source concept going we'll continue to do open source TV and Rob Stretcher and I and Savannah and the team, thanks for coming okay that's a wrap up on this segment come back, we have another segment coming up next we've got a couple more, day three it's kind of our slide out of Europe a great successful show for the Cube I'm John Furrier, we'll be right back