 Well thank you everybody for coming today to hear me out. I'm your speaker Matt White, I lead the generative AI commons at the Linux Foundation. My background is in open source software and standards and applied AI research and my passion is in applications of generative and autonomous AI at the intersection of simulations and specifically around multi-sensory learning and autonomous agents. I'm happy to be here with you today to talk about the important role of open source and open science and artificial intelligence as well as to introduce you to our initiative at the LFAI and Data Foundation, the generative AI commons. So I'd like to start today's presentation with a summary of some of the open source trends related to generative AI. A recent O'Reilly survey noted that 17% of businesses are using open source generative AI and the Linux Foundation's open source generative AI survey revealed that 50% of businesses are already using generative AI in a production setting. With the majority preferring open source over closed source solutions by a ratio of four to one. Companies are actively implementing open source generative AI solutions in their organizations. Many of these trends that we are witnessing in applications of open source generative AI mirror what is taking place in the overall industry with closed solutions. We are witnessing a surge in corporate investment in open source solutions. This trend underscores the growing recognition of strategic advantage offered by open source open source's flexibility and innovation potential. Accompanying this is the increasingly common practice of fine-tuning generative models with domain specific data and reg-based implementations using open source vector databases like Milvus and Cassandra. A notable development is the development of ensemble of experts approach which leverages multiple domain specific models within a modular architecture to enhance performance and robustness. This is complemented by emergent platforms to automate the and orchestrate generative models, generative model life cycles referred to as GenOps which streamlines the development, deployment, monitoring and maintenance of generative models and generative applications. And we are also seeing the emergence of managed model services that blend open source availability and flexibility with the reliability of generative model API providers. Principally this is originating with the hyperscalers and this hybrid approach facilitates accessibility to advanced AI capabilities and provides managed options for organizations without extensive AI experience. The open source movement in AI is further fueled by an eclectic mix of grassroots AI research groups and innovating in open science research like Eleuther AI, LMSys, Rockove and Open Finance Foundation. Complimented by a vibrant open source AI developer communities centered around AI application and agent frameworks like Langchain, Lama Index and AutoGen. These collectives are at the forefront of innovating in the generative AI ecosystem demonstrating the practical impact and collaborative spirit of open source. Shifting to research and community trends there's a clear movement towards developing smaller more capable models. This effort addresses the need for efficiency and scalability allowing for powerful AI on smaller devices which is pivotal for AI at the edge. In terms of model architectures innovations such as Rockove, a random window-based key value have been developed to challenge the status quo of the transformer and attention architectures. This alongside inference optimizations like Qlora, group query attention and sliding window attention which seek to reduce the computational overhead of generative models during inference. The diversification of language support in large language models is also significant with a marked increase in non-English LLMs. This expansion enables AI to serve a global user base more effectively. Lastly we must acknowledge the emphasis on privacy preservation through techniques like federated learning which aligns with growing concerns around data privacy and security. These trends collectively depict an open source AI ecosystem that is rapidly evolving marked by both technical innovation and a commitment to ethical and sustainable AI development. So now I'll touch on some of the challenges that the generative AI community faces especially those around open source. The rapid commercialization and growing public interest and adoption of generative AI has spawned some significant challenges. Adding to the issues AI doomsayers, the media and some commercial organizations have been sowing seeds of fear, uncertainty and doubt. These along with other issues have created a difficult climate for open source generative AI and some of these key areas are AI legislation. There are new developing laws that may adversely affect open source AI. Open source AI may be burdened and encumbered by invasive oversight and prohibitive legislation. This may prevent nonprofits, independent researchers and open source developers from making advances in AI research and applications. There are licensing concerns. The trend towards restrictive licensing in AI is contrary to the principles of open source, potentially hindering innovation and limiting the use of AI models. Companies and organizations that continue to misrepresent their projects as open source and those that perpetuate the same are creating a confusing climate for consumers, governments, researchers and enterprises. There's fragmentation so open source AI communities and projects tend to operate in isolation from one another which results in a fragmented landscape. New organizations are being stood up whether by foundations, alliances, committees often with little cross collaboration and significant overlap. Durability and longevity. Many AI projects originate from academic research and do not have the structures for ongoing maintenance and a strong contributing community with neutral governance. This creates enough uncertainty that can cause organizations to resist adopting valuable projects and dedicating resources to growing and enhancing them ultimately curtailing broader AI adoption. Accessibility and transparency. A prevalent issue in AI is the lack of accessibility and transparency of black box models. They currently have the edge in performance and ease of use which makes them an appetizing to organizations who wish to adopt gendered AI but they ultimately diminish trust in AI technologies. There's a lack of standards. There's a significant void and universally accepted standards within AI whether they be standard benchmarks or defined metrics or interfaces and metadata formats. This can create integration hurdles for developers and organizations adopting gendered AI. And open science. Open science is moving towards closed science in AI. More organizations are not releasing their research or their data sets, models, and restricting usage on what they do release if they release anything at all. And there's AI education and disinformation. So misunderstanding of gendered AI by the general public is a prevalent and constant issue exacerbated by misinformation perpetuated by social media and sensationalism in the mainstream media and public discourse. Also, the emerging workforce lacks the necessary knowledge and skills to adapt to a world where they must work with AI. So open source gendered AI isn't without its own issues and although most of them are inherited by the stochastic nature of the current state of the art and deep learning, open source models do introduce some concerns that we need to be cognizant of. The conventional concerns around bias and unintended consequences, accuracy, and hallucinations, of course, persist. Along with concerns about copyrighted and inappropriate materials included in training data and the ability of models to memorize them. But there are also some concerns about scalability and performance of open source generative models. Whether organizations have the skills to deploy and support gendered systems and concerns about misuse by bad actors for creating deep fakes, marginalizing populations, and disinformation attacks at scale. So as I mentioned, AI legislation is a growing area of interest and potential concern for open source AI. Over 69 countries currently have some form of AI policies in place and the EU Act, EU AI Act was just passed. The White House has issued an executive order on the use of AI and the Bletchley Declaration was recently signed in the UK. This could be a talk onto itself but there are both potential benefits and challenges to the advancement of open source AI due to legislation. Some of the benefits could be industry standardization, clear universal ethical guidelines, increased funding for nonprofit research, and clarification of IP and data usage for training of generative models. Some of the challenges could be compliance costs and other barriers to entry for smaller organizations, further fragmentation of the industry and reduced innovation, all of which could be detrimental, have a detrimental impact on open source AI. And this is an area that we not only need to watch but we actively need to engage in in order to guide AI legislation in the right direction. So now I'd like to touch on the topics of open source and open science. The convergence of the open source movement with AI research through the research community has resulted in some misconceptions and terms like open data and open source have been used loosely and misrepresented, particularly when it comes to generative models. However, these terms are very clearly defined by the organizations responsible for having to find them and are not ambiguous. So I'd like to take a moment to go over what each of these open terms are and what they cover. Open access. So open access pertains to the unrestricted availability of research outputs, particularly scholarly publications. In AI, this ensures that the latest findings, methodologies and reviews are accessible to all, fostering a culture of transparency and collaboration. Open access accelerates the dissemination of AI knowledge and ruins pay walls that might hinder the spread of information. And an example of this is archive. So moving to open data. This involves sharing of data sets freely so that they can be used, modified, and redistributed by anyone. Open data is particularly significant in AI for training of models, validating results and ensuring reproducibility. It promotes a collaborative environment where data, often the most crucial aspect of AI, is shared for the collective advancement of the field. Open data also includes open content as data can be in any format, not strictly text. So open source refers to the practice of making the source code of software available to the public with a license that allows free use, modification, and distribution without any form of restrictions, except to preserve the terms of the original license. In AI, open source software fosters innovation and developers and researchers can build upon existing work, adapt models for specific needs, and contribute back to the community. It should be noted that open source is only open source when it employs an OSI approved model or license. And projects are either open source or they are not. There's no middle ground. Open science. So open science combines these elements, applying them specifically to scientific research. It encompasses not only open access to publications, but also the sharing of data, methodologies, and software used in research. In the context of AI, open science ensures that the entire research process from data collection to model development is transparent and accessible by anyone for any purpose. And it's important to recognize that open access, open data, open source, and open science are integral components of a larger movement known as open knowledge. This movement advocates for the free and unrestricted sharing of knowledge in all of its forms. Open knowledge encompasses a broad spectrum of domains extending beyond AI and technology to include additional fields of research like scientific research, sorry, education, arts, and humanities. And it's underpinned by the belief that knowledge should be freely available to everyone fostering an environment of transparency, collaboration, and equitable access. It's also crucial to distinguish these from source available, which are often misconceived as open source. While source available projects provide access to the source code, they do not grant the same freedoms as open source, typically imposing restrictions on use, modification, and redistribution. Often these restrictions are well-intentioned, but violate the principles of open source. This distinction is vital in the context of AI, where the full benefits of openness are realized only when there are undue restrictions on how resources are used and shared. So what exactly is open source AI? Well, if you were in that other session, you'd know. That term is generally undefined at the moment and topic of much debate. The open source initiative is currently working on defining open source AI, the term. Their current draft definition is as follows. To be open source, an AI system needs to make its components available under licenses that individually grant freedoms to study how a system works and inspect its components, use the system for any purpose without having to ask for permission, modify the system to change its recommendations, predictions, and decisions to adapt to your needs, and share the system with or without modifications for any purpose. However, this is an ongoing activity and one that may take some time to ratify. And I guess if you were in the last session, I believe it's October of next year. So in the meantime, regardless of the clinical definition of open source AI, open source principles applied to AI create tremendous value. And some of these benefits include democratization. So open source fundamentally lowers barriers to entry, allowing for a diverse range of contributors to participate in AI development. This inclusivity accelerates innovation and enriches the AI community with a broad spectrum of perspectives and skills. The neutral hosting provided by entities such as the Linux Foundation ensures that projects are managed and developed in an unbiased environment. This neutral governance is critical for maintaining trust and collaboration within the open source community. And there's reduced costs. These are significant advantage of open source. Organizations can leverage a wealth of shared resources without the overhead of licensing fees. This cost efficiency is especially beneficial for startups and academic institutions. And under innovation, open source serves as a catalyst for rapid technological advancement. The communal nature of open source AI projects allow for swift iteration leading to groundbreaking developments and the continuous evolution of AI technologies. Community engagement is an open source in open source fosters a rich ecosystem of support and shared expertise, which is invaluable for both established businesses and emerging ventures. And interoperability. It's a key advantage with open source AI projects often designed to operate seamlessly with a wide array of systems and applications that complete the AI ecosystem. This compatibility is essential for integrating AI into diverse technological infrastructure and promoting adoption. So throughout history, the principles of open science have been instrumental in driving scientific progress and innovation. And by fostering a culture of transparency, collaboration and accessibility, open science has enabled researchers to act to research across the globe to build upon each other's work, accelerating the pace of discovery and problem solving. The value of open science lies in its ability to democratize knowledge, making scientific information and resources available to a wider audience beyond the confines of elite institutions or geographical boundaries. This accessibility has been crucial in leveling the playing field, allowing diverse perspectives and expertise to contribute to scientific discourse. Historically, major scientific breakthroughs have often occurred when information and ideas were exchanged freely, from the sharing of early scientific manuscripts to the collaborative efforts of the human genome project. Open science practices have paved the way for a collective achievements that might not have been possible in a more closed research environment. Furthermore, open science enhances the reliability and the reproducibility of scientific research, as open access to data and methods allow other researchers to validate and extend findings. This transparency is key to building public trust in science and ensuring that research is conducted ethically and responsibly. In essence, open science has been a driving force behind the progress of human knowledge, breaking down barriers to information and fostering the global community of inquiry and innovation. So there are numerous benefits to research, education, and the public when open science is practiced correctly. So for instance, intense collaboration is achieved through open science as it fosters a multidisciplinary, multi-institutional, and global collaborative environment, which in turn accelerates innovation by enabling researchers to build upon each other's work. Increased transparency in research is facilitated by open access to datasets, algorithms, and methodologies, making the research process more transparent and building trust. Improved reproducibility is a direct benefit of open science, where ready availability of detailed papers describing methodologies and results, along with data and code, allows for the replication of studies, validating results, and allowing for further refinements. Accelerated innovation is spurred by collective access and enhancement of existing research synthesizing new ideas and applications, allowing for a more rapid pace of advancement as knowledge is more broadly shared across communities and geographical boundaries. Ethical development of technologies is promoted by the openness that invites greater scrutiny and discussion, leading to development approaches that are more responsible and socially aware. Educational opportunities are expanded through open science. Accessible knowledge lays the bedrock for education and academic research, facilitating the entry of students and newcomers into the scientific fields of access to state of the art research. Public engagement with research is also enhanced, which encourages greater public involvement, ensuring that the scientific and technological development, ensuring scientific and technological development are aligned with social needs and responsive to the concerns of the public. So the concept of open science and AI may be new to many. To practice open science and AI means that you must provide all of the components when your research is released, not just some of them. And those components must embrace the principles of openness and how they are shared and used. The required components are data sets. So well documented data sets ensure that models can be reproduced and optimized. The data sets can be used responsibly and the source, lineage and implications are well understood. There's model architecture. So this is the model code and design. By sharing model architecture artifacts, we enable researchers and developers to reproduce, customize and approve upon existing models. There's preprocessing code. So open sourcing the preprocessing code ensures that everyone can understand that how the raw data was transformed to make it model ready. There's training code. Sharing the training code allows, is about allowing others to see exactly how the model was trained, which hyperparameters were used and allows for easy reproducibility. There's evaluation metrics and benchmarks. This is all about sharing the metrics used to evaluate the models and how models stack up against industry benchmarks. This transparency and performance measurement helps in building trust and facilitates comparison with other models, as well as highlights the risks and deficiencies of model behavior. And there's model weights and parameters. By providing access to the trained model weights and parameters, researchers and developers are not required to start from scratch and undergo expensive pre-training, but rather can build upon pre-existing work, accelerating progress in the field and enabling fine-tuning and model enhancements with low barriers to entry. Supporting libraries and tools. So this encompasses the dependencies and additional tools and code used to develop the final model to ensure that researchers can fully replicate all aspects of model development and testing, and developers can adopt models for their applications with minimal friction. There's research paper, there's the model and data cards, as well. Models should be provided with exhaustive documentation, including research papers and model and data cards. These provide insights into the model's capabilities, limitations, and recommended use cases. Research papers should not be designed to be light reading but holistic verbose and exhaustive. Model and data cards are intended to summarize these details principally for the use by developers. And then there's inference code. Researchers should share the code used for running inferences of the model, which is key to others deploying this in real-world scenarios. And all of these components are necessary to practice open science and AI. The omission of any of these defeats the purpose of conducting science in the open. So open science and open source need each other, and the very nature of open science requires that code, data, and documentation be provided with open licenses that do not restrict their use and applications. This table highlights each of the components of open science AI and suggested licenses. And I won't walk through it line by line, but code must be used, code must use an OSI approved license to be considered open source. Data sets can use any license that does not restrict its use or limit its limit modifications, such as CDLA 2.0. And documentation should use permissive licenses like CC by 4.0. And now that you know a little bit more about open science and its value and how it's applied to AI, it's easy to see that it is as a superset of open source, open data, and open access, it can accomplish many of the desired outcomes that they're talking about in the room beside us. Next we will talk about the generative AI commons. So I'd like to introduce you to the generative AI commons. We launched this initiative in mid-September and a little over 10 weeks later we have over 100 active participants and more than 40 organizations are involved in the initiative. We are a growing community of people from different backgrounds and from academics and AI ethicists to AI researchers, application developers, business leaders, and open source advocates. What we all have in common is we would like to advance open source generative AI and see open science embraced in AI research and development. We also are guided by a strong desire to ensure that we foster principles of responsible and trustworthy AI and believe that a shared understanding of the value of ethical practices will ensure that our work promotes safety, inclusiveness, and equality. So our mission simply stated is we're dedicated to fostering the democratization, advancement, and adoption of efficient, secure, reliable, and ethical generative AI open source innovations through neutral governance, open, and transparent collaboration, and education. Our mission statement holistically covers all of our core principles and or ethos and serves as a North Star to guide us through our work. So our scope is significant yet everything we do is through the lens of responsible AI. We embrace principles of open science, open source software, and open standards as well as open data sets as the cornerstones of our work. By facilitating a rich variety of generative AI projects and advocating for permissive licensing, we're not just fostering innovation, we're ensuring it's done with inclusivity at the forefront. Our commitment to establishing rigorous benchmarks and interface standards is about more than just good practice, it's about promoting an AI future that is responsible, trustworthy, and accessible to everyone. And in the spirit of neutrality, we've established a forum that is open and advocates for legislative support in the realm of open science. Our collaboration with industry leaders and researchers, it's not just about advancing generative AI, it's about doing doing so in a way that champions transparency and open governance. And by pooling resources and knowledge we're building a foundation that is resilient and as resilient as it is innovative. AI literacy is not a luxury but a necessity. We're actively breaking down barriers to knowledge with educational programs and workshops tailored to various skill levels and backgrounds. It's about democratizing knowledge and ensuring that everyone from seasoned professionals to curious learners have the tools they need to engage with AI responsibly. Through our events we're not just sharing knowledge, we're fostering a community that collaborates on open source generative AI projects that are within easy reach of those who need them to create a rich ecosystem ripe with innovation. In essence the generative AI commons is where openness meets action, is where we come together to ensure that the benefits AI are shared, the knowledge is spread and the future of AI remains firmly in the hands of the many and not just the few. The generative AI commons is structured around four core work streams each integral to the mission of open source generative AI. The models and data work stream is the bedrock of our operations focused on hosting generative models with a preference for those that are ethical and trustworthy. We are dedicated to providing data sets and advancing and advanced processing tools that empower AI development. Our commitment extends to hosting diverse and effective benchmarks that set the industry standards and drive progress. Moreover, we assist nonprofits in securing resources they need to prepare data sets and train novel models. Our frameworks work stream lays the foundation for open and responsible AI. We are developing the model openness framework to ensure transparency and accountability in generative AI models through the principles of open science and encourage the use of open licenses for all components. The responsible AI framework guides AI development towards ethical practices. We are also at the forefront of establishing generative AI reference architectures paving the way for enterprises to adopt trustworthy generative models. Additionally, we develop practices and guidelines to harmonize efforts across the field. The applications work stream emphasizes the practical implementation of AI. We host a suite of AI application frameworks and tools to streamline the development process and complete the generative AI ecosystem. Our hosting of database platforms and model operations tools provides a necessary infrastructure to support AI systems. Furthermore, we ensure the standard interfaces and metadata formats are standardized and published, promoting interoperability across different platforms and tools. The final work stream education and outreach focuses on empowering individuals and organizations through knowledge. We are developing generative AI training and certification to cultivate expertise in the field. We provide thought leadership to inspire and guide the AI community. Through academic and member outreach, we extend our educational efforts to a broader audience, including open source program offices. And importantly, we advocate for open science, ensuring that principles of accessibility and transparency are upheld. Together, these work streams form a cohesive strategy that propels a future towards a future where generative AI is open, equitable and responsibly integrated into organizations and into society at large. Through our collaborative efforts, we are not just organizing or catalyzing movement towards more ethical and open AI landscape. So here are some of the generative AI commons initial deliverables that will be released over the course of the new year. Chief among them is the model openness framework, which evaluates generative AI projects against the criteria of open science and open source. We will be publishing a state of open source generative AI report, which includes an assessment of the current market, industry challenges, and insights gleaned from the Linux Foundation's recent open source generative AI survey. We are working towards a converged responsible AI framework that incorporates principles of responsible AI and trustworthy AI that is highly actionable and can be broadly adopted. And we will be growing the number of hosted generative AI projects, whether they be models, data sets, or applications, to grow a rich ecosystem of open source generative AI projects that are guided by a strong community with neutral governance that embraces the principles of responsible AI. So we are still very early in our journey. Our workstream groups are developing high value outcomes that will benefit all of open source generative AI through cooperation and collaboration. And that is why I turn our attention to the most crucial part of today's presentation, your participation. The generative AI commons is not just a concept, it thrives on the active involvement of each and every one of you. This is our collective call to action. It's your opportunity to join a vibrant community dedicated to shaping the future of open source generative AI. Whether you want to engage in discussions, share your insights, or listen to the pulse of the latest generative AI developments, join our conversation. Your voice is essential and your perspective is valuable. If you're working on a project that can benefit from the collective wisdom and resources of the Linux Foundation, consider hosting it with us. It's not just about finding a platform for your project, it's about amplifying your impact through collaboration. And if you're ready to take a step to move from participant to pillar, consider becoming a member. LFAI and data membership is the gateway to advancing the goals of the commons and contributing to a future where generative AI is developed responsibly and equitably. And please feel free to scan the QR code and all the links are accessible from our website. So if there's a fundamental takeaway from today's presentation, it's the collective responsibility we share in forging an open future for AI. This encompasses not only generative models, but also spans autonomous systems, discriminative algorithms, and future AI technologies yet to be conceived. Our unified efforts are pivotal in ensuring that AI is developed and utilized within a framework of openness and accessibility, guaranteeing that advancements in AI are leveraged for the greater good and the benefit of all sectors of society equitably. And thank you for your attention today and your time. And I apologize I've been losing my voice for the last few days. If you know somebody, come hang out with us. Basically, yes, so the open membership, there's a different scale of the memberships, right? So for nonprofits and academics, there's no fees involved. But if you want to join the commons and join the conversation, that is not, doesn't require membership, if that makes that clear. Yeah, yes, yes, you can participate. So we distinguish between members and participants. So participants, no fees involved, you can join the commons and join the conversation and contribute. The membership is LFAI in data. So if you wanted to become a member, and that's a little more complex, but yeah, but if you just want to engage in the conversation and get involved, yes, you can do so. And you can hit all the links from our website as well. It's separate membership. So membership is being a membership of the Linux Foundation, right, LFI and data foundation, which is the organization that we're in. And the generative AI commons is open to participants, right, people that want to engage, people that want to contribute to our overall objectives. Any other questions? I tried to talk fast so people can jump to the other session if they want to go and contribute to the definition of open source AI. Actually, I don't know if it's open source AI or open AI. I know. It's a bad word. All right, thank you.