 Hello everyone, because this is a Linux Foundation event, so I thought I should do something fun and go through all the GitHub repos on JVAIs that I could find. It turned out to be an impossible task. So when I just went to GitHub and entered LL, I immediately followed 26,000 results. I was like, okay, let's try GPT, and it's even worse, like 100,000 results on GPT. So I had to narrow it down. So in the end, it decided to go and search for repos with at least five restores, and end up with 800 repos. For those who are curious, so these are some of the GitHub accounts where I can find the most number of repos on my list. So this is not, of course, they are like very regular suspect, as in the usual suspect like opening up Microsoft, and it's not a comprehensive list. I feel like if I keep on searching I probably can find more, but at some point you have to stop with those obsessions. So now that I have data, and the next thing I thought about is like, okay, now how do I run some analysis? And the first thing I thought I should do is to try to categorize these repos. And they thought, okay, how should I go about it? And they taught you a bunch of people, and in the end we said to break it down into like different layers of the stack. So as a top layer, then we have a lot of applications, right? First of all, chatbot, image, editing, and all of these like ridicule applications. You can use the phone or desktop on the browser. And then just below it is the, we call it the application development layer. So we call it like AI engineering when you would develop the applications for end users. And just below it as a model development is when you actually create output models, something like Lama or like other Michel. And of course there's always infrastructure that's where everything happens. Like, so for most of the layers you don't really have to know a lot about machine learning. So, but one layer is model development when you actually deal with the model weights and change model. But the rest of the layers you don't really change model weights. So for infrastructure, so some of the categories are like compute management, right? For example, like Skype pilot or like Hexville or like Ray or like inference server monitoring for more development, you have something like modeling and fire tuning. And this is where there's a lot of exciting things happening. And also like there's an inference optimizations like GGML, TensorFlow RT or Trident. And as AI engineering level, you can see like AI engineering framework like LangChain, GPD4, all Lama index. And then we would have a lot more of those as we go through the process. Oh, I think this is the oldest slide. But so when we try to like categorize breakdown like which number of repos, then we see this like most of the number of repos we found in the AI engineering and more development. But what has really surprised me is like the number of open source applications out there. Like it's very easy to go like whatever you want to do, you can pre-fight or get repos with exact, with the exact need and totally open source. If we can break down more further, it can show like subcategory. And I just thought I would go and show in detail like what some of these subcategories are doing. So one thing to know that really jump out to me is like the number of repos that related to like inference. So like inference optimizations and inference server. So these two categories are very closely related because they both solve for latency and cost. So like if you want to like solve the latency for the end users, you can either do it as a model level. Like you might want to quantize a model, make it smaller, faster, or you can try the new exciting techniques like faster decoder. For example, like Medusa is a framework that allows you to like use multiple decoders so that you can do faster inference, faster text token generations, or you can use something like look ahead decoding. Whereas the inference server is you deal more with something or like dynamic or sequential or continuous batching or some newer techniques. For example, like you might try to like host multiple LoRa's adapters concurrently. I can do something like packaging and loading of models. So I think a lot of the tools in either this category might also try to do the other because it's soon for the same problem. Another category that I find interesting is prom engineering. So when I talk to my friends that like so I actually recently went through like a list of like use case studies of my 22 enterprise A.I. use cases. And something that's like a lot of these case studies mentions is how challenging it is to like do prom engineering and managing prompt. So it was when I filed this repo, it was like, okay, so what are these tools that help you tackle prom engineering doing? So and I tried to break them down. And a lot of them tried to like give you less and be guilty. So like language models or like A.I. models probabilistic by nature. So like it's very hard to like guarantee like what's the output's gonna be. So like the input might change. So like of course you can set things like temperature equals zero. But for inputs that changes slightly, it can change the output very, very significantly. And for a lot of applications, well that depends on the certain format of the output. It's very hard to guarantee for that format. And OpenAI recently, it's a depth day or now the function calling to allow for, to make it a lot easier to like get the expected output structure for of the models. And a lot of the tools deal with for prom engineering deal with like structure output. And also another category, problems that prom engineering is solving for is the memory management. So in real world, like you might chat with a chatbot for a while and how to make sure that how can you keep updating the memory of the chatbot so that it knows what you're talking about. And we see a lot of application like that, not just a personal assistant, but also like in gaming, in educations. And it's a very exciting area. So when we go into open source, right? Open source is just not open source. Like you quote a lot of open AI recently, like open source is nothing but I miss people. So there are a lot of like, I was wondering like who are the people building on this amazing work that the today AI depends upon. So the first thing I thought about is like, okay, let's just try to break it to this account. So on GitHub, for each account, we can see like whether this account is an organization or an individual user. So it's like, okay, so like let's see like what is the percentage of these repos are hosted by individual account versus organization account. And it actually pretty surprising. So like a total of these repos, a total of the most popular repos on gen AI actually hosted by individual account. And of course, if you go through the different, different layers of stack and you see that's the level of the stack, the harder it is to build this by yourself. So originally it's like, can I do this alone? But I realized when you do open source, it might start it, but you never really built it alone because it can be the whole community built with you. I'm so good to see more. Here's a more of a breakdown by subcategory. So it's almost you probably don't ever see any tool like compute management, factory databases or data management built by individual. But for this category, you need a whole team to do that with you. And another thing I was curious about was this like, okay, so like comparing all the repos like hosted by individual contribute, like individual users and organizations, what's the status distribution is gonna be like. And it's surprising to me is like when I saw that like, oh actually the repos hosted by users on average I shouldn't have more stars and more forks than the repos hosted by organizations. Of course we all know that's like the cow of stars, not the perfect metrics for impact, but what this thing is is interesting. But then I tried to break it down by category. So it really depends on which category you are. So for, I play for most of the categories, it's actually repos hosted by organizations that have more stars and forks, except for like applications. It seems like it makes sense because you can see a lot of users spend a weekend view some really good applications and they can get very, very popular and then they can start like a company on top of it. So when I was going through this and just some things that I was worried about is just like, sometimes I see some people when will contribute repos, right? Like people really committed to their repos, like they started repos in this one should they work on it and they don't want to like contribute to other, but that was not the case. When I was going through the list of developers, so I was able to map out like 10,000 developers who contribute across 800 repos. And among them I found that's like a quarter of them contribute to at least two repos. And you can see a lot of them contribute to like a lot like 18 or like 15 or 10 repos. And when we're trying to plot this and I found that I couldn't plot the distributions of repos of like the number of like repos each person contribute to because there's one outlier. There's one person who contribute to almost 300 repos. And I was just like, who is this person? So I looked at that person up and that person actually contribute overall like I think over 2,000 as a repos not just in gen AI, right? And a lot of the repos are like smaller commit. For example, like fixing the typos here and there. But I mean like it's meaningful. And I think like what, so I found it's very educational for me to went through this person's like all the contributions and see like, oh, there are a lot of cool interesting tools that that person found and it didn't before. So I noticed like, this just made me realize it's like how easy it is for us to get started and fire repos and then make some changes to that work. Okay, so that is pretty much my work. So like my talk. So I have the full list like of all the 800 repos here if you want to like check it out more. I noticed like there are tongues that I missed. So if there's anything you want to add, please do. And I hope that you can find like on this list you wouldn't find some repos that can make the work or the life in the future a bit easier. And so hope that they can also like motivate you to maybe make a start. If you haven't already, I know that it looks foundation event so you probably contribute to open source. But I just want to say that like it's very easy to get started and contribute to open source. And I will see that a lot of the very impactful repos we see today actually started by individual and not organizations. So I hope that maybe if you haven't already also start building something and share that with the community. Thank you very much.