 Hello. Hello. Hello. Is that audible enough? Too audible? I can make myself sharder. Sorry. Let me... Am I heard? Hello. Okay. Hi. My name is Sean Bladsvet. This is the talk on open-source AI recipes, not with the undergraduates. But we've been doing a bunch of stuff with various governments, not necessarily in America, and then have a belief of, well, if we can turn all of this work into really open-source assets, that that is good for the world, and hopefully I'll convince you by the end of this, so that's true. So today's talk is going to be about the normal stuff that I feel like we're all into right now, which is the terrifying, amazing future, a little story of farm-bots in the agricultural space, some of the lessons that we think are relevant from open-source as it applies to this, and then what we think then is the implied strategy and architecture, and then finally a call for us to share. So that's the flow, and we'll get into it. Before we start, everybody wants to know who the hell is this person talking to them. My name is Sean. I grew up in California. I spent many years in Microsoft as a PM in Office and Windows. I then moved out to India in 2004 as the third guy that started the Microsoft Research Lab in India after three, and then I got the patent in machine translation and instant messaging back in 2004. I've been very into essentially chat versus earlier SMS as the metaphor, which has been dominant in India for a really long time. I then had a job site focused on informal sector labor for 11 years that became the largest one in India, about 10 million users, and then did some other stints at Dara and Marco Polo. He did a little bit of the Intersonnet policy space at the White House, and so that's me as the background. But in terms of where we are now, we have this belief that obviously every organization and every thinking job is going to use AI. These things are explosive. I don't have to tell you this. But the part that is often repeated for refrain, which is that obviously everybody that's going to use AI will replace everybody that doesn't. And the stat that really blew my mind that I heard this morning on my favorite podcast, which is AI Explained, if you haven't subscribed to it, was by the end of next year, we think there's going to be 100 times more compute in the world than there is today. By the end of six months from now, there'll be 10 times more compute than there is today. And so there is really this, I feel like we're at the beginning of the acceleration toward this singularity. And that is a weird moment to be in in technology. Because essentially our belief is all of our jobs as thinking people is going to be that of a prompt writer or an API stitcher. Can we control the AIs to basically create useful artifacts on our behalf? And probably the way that we differentiate ourselves among that is to hook it up into some other API that's not obvious. Maybe it had our personal data, be it public data, be it another API out there. And so if that's the job for most thinking people, then it implies a bunch of different questions that I think everybody's going to have to ask, which is how do I find what's working in my field? Can I fork that? How do I keep up with all of the other innovations in every other organization and every other worker that's happening? So this discovery piece I think matters a lot. There's this constant pace of change, which is every industry is collapsing into this one. If you look at the folks that are trying to figure out where's the next place to do oil drilling, they're building the fine tuna LLM model that's sort of just like we do next word prediction, they're doing next oil drill prediction. If you look at pharmaceuticals, it's a generative AI problem. If you look at finance, it's a generative AI problem. And so there's this huge collapsing that's happening here. And then you end up with basically tech risk, which is I want to take advantage of the latest stuff from OpenAI, but I'm afraid of Microsoft and oh, the open source ecosystem also has a bunch of amazing things. And Gemini has a bunch of amazing things. And so there's this question of how do you keep up and build a platform really that is future proof such that every innovation that's coming out and as all of these fields are collapsing, no one benefits you and benefits the stuff that you're trying to build. And then finally, as you're doing this, as you're sort of constantly swapping out these components, how do you do analysis and evaluation to say is this change worth making? I think this is going to be the problem of most organizations and most people as these fields collapse and there's this sort of constant inheritance of code and AI that we're all building on top of. So, you know, to make this concrete, I'll tell you a story that started at this point 15 months ago of a collaboration that we've been doing with an organization called Digital Green. Digital Green has been operating as an NGO in India and Africa for the last 20 years. And they had an incredible insight. And it was like it was started by a colleague of mine out of Microsoft Research India back in 2005. And they had this incredible insight, which is if you can film farmers telling them basically explaining what they figured out as best practices in one place. And you literally just film that and package it as a small video and you take that to other adjacent farmers who speak the same language and are growing the same crops 10 miles away. It is the most effective way that they've ever discovered to convince a farmer to change their behavior, right? More effective than any other government expert coming in and saying you should do this, this and this. So, they've done this for the last 20 years. They've got 100 million YouTube hits. They have 10,000 videos. And they've scaled, right? They've received five large Gates Foundation grants. They have 400 people. They're a big organization. They work with 10 different states in India, the government at Nigeria, the government of Ethiopia. And so we came together late last year and we said, hey, you've got thousands of these fact sheets, these videos, this accumulated wisdom. Can we basically make this truly accessible to extension agents? Extension agents, you can think of them as mentors for farmers. And the goal there was, could we build a WhatsApp bot, right? That gives you vetted data, right? That gives those farmers, when they ask a question, that gives them vetted data that's editable by those experts. So it's not an hallucination. It's not what GPT-4 thinks is the right way to grow chili, which is mostly based on US documents and constantly confuses the word chili with the word capsicum, which in Indic languages is two totally different words, but in the lexicon of the biases of GPT-4 are actually the same thing. So you have to tease those apart. So you really want to basically use on their data, the stuff that they've collected. Can we do that with simplified links and citations? Can we do that in most Indic languages so that people can actually speak to it in local languages there and then get a piece back? And then more recently, can you also diagnose crop issues so basically people can take a picture and ultimately determine, hey, what's wrong with my plant and how do I fix it? So we demoed this with Digital Green at the UN last April as part of the science panel there. And then to give you a sense of kind of like how does it work, right? You basically ask a question in text. We basically run it through and you'll see in a second, like ASR. It does a rag look up. You get an answer. It can do, you know, we've also let basically hooked in the speech transcription and then translation in multiple languages. And then you get an answer back and you get citations. That's it. This is the rag demo hooked up into WhatsApp. And then more recently GBTV came out and Gemini came out and so boom, you can do some plant diagnosis as well. If you want to go play with those directly, you can scan those. I'll show you all the links at the end. And so the impact of this was pretty good, right? We've had 1,500 users. There's 35,000 messages exchanged. There's, you know, large scale reinvestments happening. This is being expanded with a program called The Star at a national level across India. You know, the also the anecdotal feedback was really fun, that people felt more confident, the fact that they could ask questions in their own language related to their livelihoods, right? And then do that in a way that wasn't like that basically allowed them to save face, right? It's embarrassing sometimes to ask questions to your colleagues or others. And so the fact that they could do that with a bot, right, was really nice. And so this is great. Now there's this part that, you know, obviously I don't know how many of you have built, right, these LLM bots, these RAG bots, you know, but, you know, you think like, okay, you know, if we go back to it of like digital green, had a bunch of documents, and if, and then on the other side you want what's happened. So if you think like, how do we think this should work? It's like, well, you would put all of that into like a vector database via RAG, right, and then you run some speech recognition stuff and you ask it into the knowledge base. And then, you know, you basically summarize those results. This is the RAG pattern. And then, you know, you summarize those back and then you do text-to-speech and you send it back. Great. But, you know, if you built these things in practice, shit goes wrong, right? And so, you know, the stuff that goes wrong is, well, speech recognition doesn't actually work for most low-resource languages. Bhojpuri, Malawi, Tui, Chichwa, right? And so how do you solve that problem? Well, so to do that, we have to build out our own A100 cluster, run open-source fine-tuned models out there, run that in enough GPU, which, you know, six months ago was ten thousand dollars a month. Now it's only twenty-five hundred dollars a month so you can get real-time interactions and get an actual response back in under five seconds. All right. And then, ideally, you also want to basically swap out. So you have all of the best private models. So when Whisper 3 comes out, like that one's pretty good at some of them, the chirp GPU model, or the chirp model from Google's are also good. But you're always basically stuck with how do I run all the open-source models, but also run the best of these to actually get this to work. Then you have to use different models for the long audio-over-the-short model, of course, right? Video transcripts, given that we've got a database of a thousand videos, actually suck inside of vector DBs. So you have to do a whole synthetic data pipeline around that. Then, vector DBs like long, short text. And so if you look at, like, we suck in guidance from the Ministry of Agriculture in English, written three years ago, from multiple countries. Well, those aren't well-formatted markdown, right? These are dirty, dirty PDFs with complicated tables written by not necessarily the best user experience designers. And so, how do you basically get the AI to actually understand the content of those is hard. And, you know, from our perspective, we used every OCR AI piece that we could. We then did a synthetic data pipeline on top of that. We then put all of that into Google Sheets, such that the local teams could edit it to basically figure out that data as well. And you have to do that, otherwise you're going to get garbage. Then, you know, oh, users don't ask questions, right, like the answer. So again, you need a synthetic data pipeline that's there. You know, they like, they dislike those tables. They have every weird document you could possibly imagine. So you need to basically have support for sucking all of those in. And then, you know, again, this piece, how do you normal people update this? We needed a way to basically suck directly from, you know, the knowledge level experts. They're used to Google Docs, and so we needed to basically pull that in. Vector DBs are terrible, right, follow-up questions. So you have to have your own kind of summarization piece that's running in front of all of these parts. They can't handle keywords and weird phrases. So again, we have to basically make our own keyword parsers that is mixed with the DB. Then, oh, by the way, speed matters tremendously. If you take two minutes to give somebody an answer, they don't use the product, right. And so to basically make that happen, you have to invest in these, like we do just-in-time Vespa DB indexes for every document that you can run in our system, plus we support streaming to all of the popular apps. So, and then, you know, obviously figuring out where the hell a bug is is really hard. So because there's so many levels here, it's difficult. So you need observability there. You need basically a feedback system that you build in with, like, up and down piece. And you need meta analysis to say, great, what types of questions are we answering well? Are the ones around Pest can we answer well? What about the ones around Chile in this particular region? Do we have a failure there? So you have to have this sense of where am I doing things well versus not, and you need that built in as well. And then finally, you also at the end of this, you need this sense of like, well, I just made a change. Is it any better? And so you need an evaluation framework. So that's a long way of saying, like, there's a lot of work. This is a year and a half of work, right? And so the way that it actually works obviously is way more complicated than that initial one. And so the question that we had though had is like, well, what is it if we're going to make our own work truly reusable and future proof? How do we do that? Right? And so then there's lessons that I think that are relevant from the open source world that are, you know, that that we've tried to incorporate and that I think are really relevant, which is if I think about, you know, the innovations that or the the interventions that drove innovation, right? It starts going back to like, you know, the Royal Society and publishing of papers, right? And, you know, the fact that the scientific community has had peer reviewed papers for a really long time. And I feel like open source borrows from that tradition and saying, hey, if I can build on top of your work, this inheritance of what is in our actual knowledge base as a society keeps getting bigger and bigger. And so all the and then, you know, things like view source, things like JS fiddle, feel to us like extensions of that, which are small innovations that allow me to see things like your prompts, right? And allow me to see what was the code that made this interesting thing. Can I pull behind the curtain and learn from that, make a small tweak and make that happen? That we want to encourage reuse, that we want this to be fun, right? That we want to actually include everybody, not just the people that attend scale and, you know, white dudes from America, but the broader sense of folks that don't speak English well, it exists everywhere across the world. And then finally, this belief that we must keep abstracting, that in the abstraction, we get productivity enhancements, which again is our sort of humanitarian inheritance. So all of that has led us to GUI, right? And so GUI is essentially what we aim to be is this sort of Pinterest plus GitHub of these AI recipes. And I'll sort of show you what I mean. And so our purpose is to say, well, how do we enhance the innovation infrastructure? And, you know, the first part of that is we think it's really important that if we want to, like if all of these fields, if every significant field is trying to make a better LLM every day and trying to make a better, you know, 11 labs lip sync model every day, how do you get the advantage of all of them? And so the idea is, like, well, what does it mean to attempt to commoditize them? And so at our level, we have an abstraction where every LLM is hotswappable and comparable. Every vector do be as hotswappable and comparable. Every speech recognition piece. And I'll show you a little demo of like what that means. And so if I go here, you know, make this a little big, like this is just one of our recipes, which is the LLM one, which is almost basically the simplest. And what does it do? It just takes a prompt, right, which was, in terms of coolness, what are the top five car brands sold in America? And it's really interesting to look at this because I'm taking the same prompt and I'm asking Lama, Gemini and GPT-4 Turbo, and they have real embedded opinions about what is the coolest car. And the thing that's amazing about that is those opinions are going to affect every document you make in Word and PowerPoint for the next 10 years, right? And so if you're a marketer in the audience, you care about what opinions and biases these guys have, because those are real. And so I feel like we've spent a lot of time and it's very justified thinking about the biases around race, identity that all of these LLMs have. Well, they have biases about everything based on the training data that they've given. So this allows you to see it. And so this is, you know, the simplest recipe that exists on our site. And then you can go and see examples of everybody else in terms of how they're running it. You can see literally how Farmer Chat churns conversations into structured data. That's one of the examples that's there. And then, you know, everything that you see here is then exposed as just a rest endpoint. Every change you make is then exposed as a rest endpoint. So anything you see here, you can say, great, I want to play with this and I want to pull in on my own app. You can. So back to here. So, you know, and then we do the same thing, right, for speech recognition models. We do the same thing for all of these other parts. So to keep going. And so, you know, and then the second part is that, you know, we want to basically abstract these things into these AI recipes. So in this sense, like, OK, there's the basic LLM, but then there's a bunch of normal things that you would want to do like hook it up to search, sir, hook it up the ability to incorporate any document, hook it up so you can suck in any YouTube URL. And so, you know, again, if you want to take a look at that, right, those are just recipes. So here is a simple recipe that takes a search query, uses essentially all gets all of the related terms for that search query and then builds the answer for them. And so we did this with a client. We added this to 10,000 of their pages, right? This is the number 15 wiki site in America. And then essentially from that, you know, showed that there was a 7% it boost in session times when you actually end up answering the zeitgeist questions that people have about any particular topic. So this is just a recipe that does that for somebody. And it's one of the sort of primitive data pieces that we've got. You know, there's another one which is you can obviously search any document with the LLM of your choice. And this incorporates and sort of wraps Lane Chang on top of that. But then on top of this, what we're trying to do is to say, well, there's a social network and sharing part that matters a lot. So if you come here, and I'll go up here, if you go to essentially our explore page, you can come and you can see all of these recipes that we have built and others have built. And then, you know, if you want to see great, what's this QR code generator thing. And again, this is just wrapping this little recipe, which is, you know, essentially taking up some control net pieces and putting them together to make a URL. But I can see all the examples of that. And if I like the work that Alex has done, I can see all of his examples, and I can see the prompts behind them. Right. And so this strategy has been pretty good thus far. We have 300,000 people that have run these things on top of our site. We've got, at this point, I think we just passed 2 million workflows. We rank in the top three for AI animation, for AI QR codes, for AI lip sync, right. It's this is the sharing infrastructure by people making these recipes and then getting them out and then other people being able to immediately go ahead and fork them, you get this innovation piece that goes across it. So the other part, you know, is, you know, going back to our farmer chat example is, you know, finally we get into then inspectable bots, right. And so this is farmer chat and like there's a website around it that's up here. I'll make this a little bigger so you can see it. And so this describes the project and it's in five different languages, right. And, you know, here's the little talk that, you know, Rick and Gabe, who's the CEO of Digital Green at the UN. But for me, the, and then, you know, you can talk with it here if you want to ask questions and there's a little bit UI part. But for me, the parts that's most interesting is if you go down and you say tweak the workflow, this is the actual workflow that drives that product. This is the instruction set that's on top of it. This is the particular AI model that they're using and you can swap that. This is the particular set of documents that they have trained that is their RAG database. This is the speech provider and the voice that they use when they give back an audio response to it. And then we just answered this, which is and then, you know, if you come here, like if you want to run a synthetic data pipeline on any of these things you can, but and then this is the embedding model that we used for that particular RAG piece to make that happen too. So most people will not care about this level of settings and just will change like the high level instruction prompt and essentially the documents. But if you want to change those things, you can. And then you can literally just come over to the integrations tab and then say, hey, I want to connect this to my own WhatsApp number. I want to connect it to my own API or embed it as a widget, whatever that is. And so, you know, the part for us that's exciting is, you know, this happens, right? And then, you know, this is an example literally of us. So we did this first in Telangana, which is a state in India. And then we extended that basically into Ethiopia and with digital green and then we've extended it into Kenya. And literally with each place that we're going, we're just taking in a call center document of all the questions farmers have asked and the Ministry of Agriculture guidelines. And then the fact that suddenly, you know, there's good ASR models, speech recognition models that work for Swahili and then do Swahili accents back. This farmer, right, has used WhatsApp maybe 10 times. And suddenly he's taking a picture of something wrong with his coffee crop, asking that thing asking basically questions via very shitty internet connections over audio and WhatsApp and getting back answers that are relevant to his livelihood. So this is an exciting thing. And this is one of these things like, oh, yeah, GBTV shipped what four months ago. And it's not just productivity gains at the level of programmers that we're beginning to see it's going to affect everyone because suddenly the original promise of Google, which is the world's information accessible to everyone suddenly gets to be true, right, especially when it talks about kind of the domains that people work in. So this is exciting. And then the part that, you know, we weren't the only ones to notice. So, you know, the Guardian wrote a piece about this. And then, you know, to sort of see how this is coming out. Opportunity is another NGO, very large one, they get 100 million, I think, I don't know the actual number, a lot of money from USAID, right. And then basically like we have similar problems in terms of enhancing farmer productivity in Malawi. Can we make this work there? And we said, sure, it's just a matter of swapping the documents. And so this is what I mean by these shared AI recipes actually turning into assets that then spread for folks trying to do interventions in most AI related industries, which is almost all of them. And then we're doing the same bit, right, for another client who's like one of the biggest HVAC PE firms in America. And so what they're trying to do is basically gobble up lots of like, you know, furnace repair shops. And their goal is to say, hey, we've got a thousand manuals about every AC ever sold in America and every furnace. Can we make the Oracle of Wisdom that knows everything about that? And so we, you know, we've worked with them again for the last eight months to basically use the same recipes, use the same core technology and basically build that. The MPS on that right now is around 50, not bad. And then, you know, we're doing other things. Basically, that also incorporating vision that allows you to basically fill out forms just by sort of taking pictures of what you're doing. So the last bit of this, and I think this is the sort of mental change that we will make for many of our jobs is if you want to take advantage of all of the latest things, and those are all coming out each week, you need an evaluation infrastructure to know is a change worth doing. And so the last sort of leg of this is this sort of, you know, an eval framework that's built in to sort of raise all boats. And so what you can do here, and if I look back at that UGASI, which was the folks at Opportunity that took the farmer chat piece, like we've gotten a built in of, you know, kind of analytics framework that's telling you things like not just how many messages came in, but how many pieces of positive versus negative feedback were coming in, how many of questions do we think we answered this appropriately or inappropriately, what was the categorization of those things. So again, that's built in. And then, you know, the part that's probably most exciting at kind of a higher level is some work that we're doing with the AikStep Foundation and People Plus AI and MSR India, which is basically to say great, for low resource languages, we want to set up essentially the comparison infrastructure. So we're using the same infrastructure that we use to compare bots to say, if you come in this particular case with a set of audio files, and so these are audio files in Canada, which is a local language of India. And here's the translation into English of them. And we've done that by a human being. And we call this our golden questions and answers, right? So you can come with your own Excel sheet of your golden questions answers. And then we basically say, okay, we're now going to compare the top speech recognition and translation models in this pipeline to see how close does it actually get to what we consider the golden answer. And so what this allows us to do is it allows us to say great, with each new release that's coming out, we can reevaluate for a very use case specific data set to then say, this one's the best for you. And so this is a different way to think about what evaluation is. It almost doesn't matter that Claude 2 is better at organic chemistry or Claude 3 is better at organic chemistry than GPT4. For most organizations and people, you want to know if you make this change over here, is it better on my data set? Is it better on the stuff that I care about? And then I don't know if that's a model issue or a prompt issue or a vector issue or any of the things that we described that really make up this pipeline. I just want to know if it's better in the end. And so this is essentially the evaluation framework that allows you to do that, where you can bring your own sort of golden question and answer piece and then we do this evaluation at the same time. So that was like the main points, right? And I feel like I'll end here which is as we go forward we think this should actually be an entirely open source ecosystem such that you could run any of these different models in anything that we do in the cloud. But if you want to run that entirely on your own infrastructure you can. And that if this is true then I feel like this begins like the open source analytics community needs a real answer to what AI is right now. And I feel like this can begin to be that which is we need a superset. We need a superset that allows us to take advantage of all of the latest innovations from Open AI and everybody else but also all of the open source innovations. An ideal allows us to run that anywhere. And so we essentially what we want to do is we want to take everything that we have built with GUI in the cloud which has orchestration and has all these paid APIs and then say great we can also run that inside your own data center. You can also make modifications to it. You can also hook in your own fine tune models. You can hook in somebody else's. And that's the challenge in front of us. And so if you're interested in any of those things we need help. And that's the fast presentation. Sorry for talking so fast. I'm happy to take questions. Yes. And if you scan this it goes to my Dara page and you can just message me if you're interested. If you go to GUI.ai slash blog you can see this whole open source vision there too. Okay. Thank you. First excellent presentation. My background well right now working data thinking about all the things that vision AI can do. Yes. Especially translating vision things like that. Yes. But long time ago I was in the Peace Corps in Sub-Saharan Africa. Yes. Farmer field extension. Yes. And doing same thing teaching people with a little printouts. Correct. Correct. And so my question is here. And so in this use case one of the things I think about is for people that they're trusting these bots for their livelihood. Yes. And let's just say what you are applying for. Hey you have powdery mildew in India. Yes. But then in Malawi or in somewhere else in the Senegal it's different. How do you localize that that's a data set but also make sure that I can't remember the right word for it. But make sure you're not passing on or a person giving back wrong data. So this is where you have to have a local implementation partner. Perfect. Right. We are not in charge of like if you go back to you know this original goal of farmer chat which I think is really important which is where to go. It's probably turning 40. You can't see anymore that this piece of vetted data. Right. Because as I said even if Chad GBT is trying to do its darnedest it's not actually by default localized. It has a distinct bias toward where it got most of its documents. Most of its documents came from US sources and hence it will not give advice that's actually appropriate for a particular farmer let alone get to the next level of stuff like oh if the rainfall was this and you're using this fertilizer and you're growing this crop the guidance actually needs to be entirely different. Right. And so in some ways this is one of the hardest technical problems because you have to incorporate all of this data research real time satellite data sensor data like ideally you would take all of that in before giving a piece of advice. And so you know my take is the only way to do that is if you have a local partner that's on the ground that acts as a usability piece that's acting as data vetting right like because you're right you can't give people wrong advice when it's about their livelihood and the history of development is exactly doing that. Right. And people starve and you know we're telling you know we've told people to castrate their cows right like this is you know the history of trying to do these interventions by Western nations and developing nations is frankly terrible. Right. And so again our philosophy is like we are a middleware planet provider and we're trying to give all of the infrastructure to know whether the LLM is doing the right thing but we have to have an engaged partner that's looking at that categorization and it's looking at this is where thumbs up versus down is that's doing that in a trained sense like we haven't rolled this out that every farmer can go ahead and use it. It's working with local partners to train extension agents who are the mentors who then go ahead and basically make those things happen. But I fully hear you. Yeah. Talking with government ministries. We are not our partners that use us. So digital green is a large NGO that does that. Opportunity is a large NGO that does that. Right. And they maintain those relationships that I see there are a bit of experience that is. Some of these government agencies like agriculture ministry or so that they have a bias. Yes. Which is basically like in India a few years ago they have these the riots about the black laws. Yes. Basically a question is who owns the land and who can. Yes. I mean these are political considerations grow. Right. So that these government agencies have an influence on that. Yes. And actually not doing it for the good of the people. I mean now you're getting into like the politics that are available there. Right. Like the Indian government recently passed a regulation that said that every LLM has to be approved by you know that's going to be released has to be approved by them. That is their decision. Right. Like the Indian government owns India. Right. They get to make the rules that exist there. And so from our perspective we are trying to provide the platform that will allow the analysis and for you to know are you get what type of answers you're giving. But again it's going to be an implementation partner that's going to get into the politics of like what are you recommending versus what are you not. And that's up to them to decide. Right. I just see the problem that it could lead basically to decisions in the wrong direction. Yeah. But but let me be clear that happens today. Right. In terms of all sorts of loops of oh this fertilizer company is creating the results and is really encouraging to use this particular fertilizer even if it's not the right fertilizer. And what I fully imagine is it'll probably be the companies that are selling agricultural inputs that'll push this the most that'll give reasonable advice but it'll probably be biased. Right. To basically encourage you to buy your own stuff. Is that bad. Again that's going to be a market based reaction to see great. What are they going to do that they have powered incentive to go ahead and make those things happen. But again I feel like that is beyond the scope of like they're going to do what they're going to do in their own markets. Right. And that's going to play out. And we live in that capitalist society. Right. So I know that may not be the greatest answer but right as a tech piece like they can use it if they want to. That is the point of open source in many ways. Go ahead. This may be stretching into a little bit but the partner organizations that are here are going to have. Competence issues technology issues corruption issues. It depends. Right. Yeah it depends. You know some some of them on other planets may be non-corruption. Yes yes. But what is your experience with you know without naming any names with the quality of the partner organizations and their ability to perform. I mean we've had really good partner organizations right who are you know like Bill Gates last March wrote this piece which was you know saying hey capacity building right looks like LLM bots into the future. A bunch of the best funded best intended organizations globally have paid attention to that. And so we tend to be and their own sophistication in this AI space is over the last nine months has gotten pretty good. But I mean like in Malawi or you would be surprised but that is then extending in like there's you know smart program managers inside of the Malawi you know government that are looking at these things and saying this is the direction that we want to go. And so the competency piece matters there. What we've tried to do here with things like you know our co-pilot recipe is try to demystify what's going on or is try to say listen the part you need to care about is this instruction prompt your data your knowledge corpus and then these evaluation questions and then we in conjunction that partner can then iterate to actually make something useful. I meant about your actual experience with these organizations is in general what kind of issues did you find with real partner organizations in small countries with very poor economies. So data matters a lot. So how you actually like can you get the correct advice out of the data and the documents that they have provided is no small feat. So just getting to the point where as you know I'll make it very clear. So like you answer the not the question I asked. OK. The question I'm asking is what was your experience were they very competent? Did they have what kind of problems did they actually have when they were trying to release at the level of the organization or at the level of the farmer themselves at the level of the farmers were seeing. So the things that are coming up is they really want this photo use case comes up organically among lots of folks. The other issue that it's kind of a subtle one is even though we like you know we were interviewing a set of 10 Bhojpuri farmers living about 50 miles outside of Patna which is the capital city of Bihar. They thought that given where they actually live they don't have any point in their lives where they're not being overheard right because it's a very crowded familial environment. Hence they did not want to use any speech recognition because they didn't want to be overheard but they were fine doing a sort of mixed a little bit of English a little bit of Bhojpuri texting to get answers to their questions. And so that was like a subtle surprise to me as somebody I've spent a fair number of years in the like you know tech for development space but again you sort of say well people want to be able to speak in their native tongue even if they you know or their mother language and so that that was one. The other organic case that we see a lot is again this photo use case of something's wrong nobody can tell me please diagnose this piece that's come up organically in about 30%. The other one is women tend to use this a little bit more than men right tends to be used a little bit more at night right than men. I can forward you kind of the usability findings if you're really interested those are public and so I'll get that up. And that's and of about 200 people in three different states at this point. So unfortunately I came in a little bit late so maybe I missed this but I was wondering is GUI AI is it a nonprofit? Is there a business model? Yeah, we are a for-profit right now that are basically making this whole set of stuff. And then you know our business model is everybody gets a thousand credits right to come in and every time you make an API call you burn credits and we're spending credits usually somewhere else right like our open AI bill is $5,000 a month right. So that's it you're basically buying credits and then you're selling them and then there's a little margin that we have there that keeps the whole thing running that's the primary business model. We do we've done some consulting pieces but we're moving away from that and just focusing on the main credit-based pipeline. So if you look here every time you run a recipe you spend credits that allows us to basically purchase those credits across private models and then run it on our own GPU clusters. I see so are the recipes the recipes can only be used if they're hosted on GUI. Today. So this is the open source transition that we want to make. We want to be able to say great you can find some cool thing on the site and then say I don't want to run that or I want to put in my own keys and then run it on your own site. So basically be able to pull down the orchestration layer so you can run it locally is the that's the place where we want to go. I see that's very exciting. Thank you. Yeah. No further questions. So so again I'm geeking out on this show. And so my experience is like 10, 15 years ago that's when 3G was first starting to hit rural farm villages. Yes. Now it looks like 4G, 5G is starting to hit now in India, not in Africa. Yeah. Right. Well, it depends on where you're at. But so now we have everything cloud-based. What do you think the next level is because this is super cool but what is then what's coming next? Is it edge compute on people's phones or devices? What do you what do you think is next in like ICT for Dev or even just for this in these use cases? I mean right now Starlink aside, you have to have a local economy be wealthy enough to support a tower. Right. We are the if you don't have that then you don't have connectivity. And so in the poorest nations around the world there's just no working towers because there's no power there and you have to have gasoline that's basically running the tower that will again give you infrastructure pieces. This is one of the reasons I love WhatsApp. Right. WhatsApp is the greatest low bandwidth tool that actually works really well offline. It still allows you to communicate. I think the world is ever made. Right. They they whack the hell out of that. And so like if you have the tiniest 2G connection it will synchronize all your messages in the lowest bandwidth possible. So that ubiquity I think begins to happen more. And then if you talk about the Indian use case. OK 60 percent of India is probably now online. It still leaves half a billion people that aren't per household penetration is probably getting up to be 80 percent. But it's still in a shared phone environment. I think that gets better. Right. I mean I think everybody every adult every kid in the world wants a phone eventually. You know we're making billions of these you know the Android handset built three years ago still works pretty well. So you know you think about where does this go. I think local language interfaces via WhatsApp right. Is I think where it goes. Right. For the foreseeable next two years that's I think that's going to be grown like hotcakes. That's kind of my current. Questions. Last chance. All right. Thank you. Can you hear me. Yeah. There we go. OK. So. We will get roughly started here in just a moment. Give us one second and grab your seats. All right. Welcome everyone. My name is Rishi Burma. This is Kiencek Yoon. John Engelke and we all work at JPL. Just up the street here in Pasadena. View are all out of town. Definitely recommend making a trip up to the mountains to see the view. It's just beautiful this time of year. A lot of greenery. So. Our goal here today as the presenters of this talk is to do one simple thing which is to try to convince you all to think about your software best practices software standards that you may develop in a slightly different way. And we want you to think about how to develop those standards iterate them infuse them into your own organizations using the open source way. So I will start this talk by explaining what that means and then I'll pass it to Kiencek Yoon who will talk about a contribution he's made to our open source project and then John Engelke who also present about a contribution he's made to this overall effort. All right. So let's let's get started. So first of all best practices and standards for software development. What are they right. I think we're all roughly familiar. There's you know some examples of this. For example information sharing best practices. This is it does my software have a read me that explains what it does. And is that read me more than like one sentence. Do I have a documentation site explaining how to use my software for new users for admin users developer users. Do I have a change log. Do I have a communication list of channels and some messaging mailing lists etc. Do I have a website. Things like that. So these are all best practices that I think we've seen with top open source projects Linux included. There's also governments related standards and best practices. Right. So for example the way I govern my project is that documented somewhere in terms of the roles in terms of the advancement of those roles. The the way in which issue tickets are labeled and triage among contributors. Is there a process or a documentation something that new users or developers that want to make a contribution to my software can look at a contributing guide. So these are all you know best practices that we've seen in a lot of popular open source projects especially for government sponsored efforts. You know this is as important if not more since it's you know these are taxpayer funded software projects. We also have software lifecycle related best practices and standards which my colleagues actually will talk a little bit more in depth about. These are you know is my software being automatically scanned for sensitive information for security issues. Do I have CICD pipelines established. Do I have testing unit tests available in for my software. Are there standardized templates for repositories. So these are all you know dependency management and scanning for dependency vulnerabilities version updates things like that. So again best practices and standards we all know about them and we know we should be doing them especially for government software sponsored software this can sometimes be a challenge because it's not often in the requirements list of software that needs to be developed. These things are usually just done by developers on their own based on their experience. So the question might come and maybe a challenge which is the subject of our talk which is how can these best practices that we are aware of and we want to implement how can they be actually infused into especially government sponsored software projects in a consistent way. And what that means is the standard or best practices applies to multiple projects in a similar way not just the all star project A here but other projects are falling behind. Do it in a low cost way that doesn't cost each individual project a lot of money and also can be easily infused in a way that scales to many many projects at once not just again one project adopts a particular best practice or develops and adopts it itself but many projects are able to adopt it concurrently. So how do we meet this sort of challenge? So this is our effort at JPL called software lifecycle improvement and modernization slim for short. We have a website and you can check this out this is actually on the web but what is slim exactly? So in a nutshell I'll just kind of share in terms of an analogy. What's one way to do this on your own best practice dissemination sharing? It's to make a wiki page on your local internet or Google Docs or something and say these are the lessons learned these are the best practices and I want to share them among my team. That's great and if that works for you that's perfect but for slim what we've done is we've actually created a platform for sharing best practices and standards that are being developed within our organization at JPL with the JPL community and actually the public at large as well. So the goal here is to bring together a platform and to share these best practices with the entire community. We are community based that means we have many projects representing and providing us best practice guides but also helping to receive them. We are open source and I'll explain that actually in a second more what that exactly means but the key thing here is we've developed a platform for sharing, disseminating and fusing best practices. Okay so a little bit more in-depth about what that means and our strategy because I think that's the most interesting aspect. It's not necessarily just the best practices. You can find those a lot within the internet and you can search these and see good examples of that but the strategy is kind of the key highlight I want to mention here about how we do this. So let's go back to what I mentioned the challenges the consistent implementation. How do we achieve that? So it's through community interaction and community development. So in our open source channels we create tickets basically and we ask and we have polls, etc. to ask the many projects within JPL what are the best practice standards that they need among their projects and how can we help and what are the most important and key ones? So we use open source tools to sort of gather that input and we develop those standards collaboratively. We get we use pull requests we use issue tickets people comment and we iterate them through versioning as actual deliverables we can send out to many projects. So that helps us do the consistency which is many projects get eyes on it get eyes on these practices and standards being developed and they get a voice to help make a consistent standard that will be useful for them. Low cost of adoption. We develop these best practices as code. So it's not good enough right now for us to just develop like a guide that says thou shalt, you know, compile your code using tool XYZ, right? Our goal is actually to go on some further to say we want you to develop your best practice not just as a written guide but as a piece of code. So that could be like a template that could be a piece that could be a script that sets something up that could be a piece of automation. So that's kind of our goal here's in this effort is to like get the contributions as code, standards as code. And then that allows us to meet our last challenge which is to scale to a lot of projects which is now that it's code we can treat it like code. So we can send out automated pull requests to, you know, hundreds of projects repositories, et cetera and share these best practices at scale as pull requests that projects can consider and potentially infuse. So that's kind of the secret sauce and the idea that we have here. And what's our delivery mechanism for, you know, this dissemination once we've developed these best practices it is through two things. One is our website which I mentioned earlier you can check that out. We have that on the web. So that's where you can, you know view our best practice guides that we're developing and then also submit best practice guides as well. We also have sort of a plan of what we plan to develop over the course of the next couple of months and the year. And we sort of say, hey, these are the best practices we're targeting and developers and other teams can take a look at that and be like, oh, wow, okay you're developing this in three months we need that best practice and we don't have the time to do it. So then we'll just maybe collaborate with you in three months and iterate that and develop that and infuse that. So we have also a plan for how to do this. We also, as I mentioned, have automation which I'll talk a little bit more about the effects of that. But basically, again, we take a best practice idea, a standard develop it as a code, piece of code, and then we can ship that out to a lot of projects at once. So what are some simple examples and my colleagues will talk about some of the fun stuff we're doing. These are some simple examples we've got like readme templates we've got bug ticket templates, you know, just standard things that we've disseminated. This is simple stuff but it's really powerful. You know, if you have a standard way in which bug reports can be collected across hundreds of projects, that's kind of a game changer in some sense because then you make large effects with very small changes. GitHub security, for example, just saying when you're using GitHub enable these specific features and plugins, et cetera to scan your code. We have other examples on our websites, software metrics, testing standards, governance, et cetera. Okay, so how do we monitor the impact and the infusion, right? So again, we're using automation. So what you're looking at here is one of our, what we call leaderboards. And so this is on the columns you see a list of best practices that we've developed, right? For example, like issue ticket templates. Your repositories need to have pull request templates, a code of conduct, contributing guide, et cetera, et cetera. And as we develop more standards and best practices those number of columns increase. And each row, these are some open source repositories that NASA JPL sponsors and develops. And you can see like where they stand with respect to these best practices. So we've developed some a framework that scans our community of projects that are involved. Oh, I didn't mention that, right. So the other aspect of the community development, I think it's also related to how government sponsored projects can develop community is we have sort of a community oriented model where we sit down and talk to projects at JPL and we say, here's this project. We're trying to develop these consistent standards and we want you to join it, right? And there's kind of two conditions to joining as a community member. One is, and you can imagine your own organizations you've got many different software teams, right? So to sit down with them and talk to them and say, hey, we're trying to get this collective effort going are you interested? And if you are, we ask the only two things. One is, consider using our guides because we can push out these best practices to your repositories so that have somebody to look at the pull requests to look at the infusion potentials. And the second, consider sharing your best practices back with the community being involved in our open discussions or pull requests or templates. So anyways, these are some repositories and projects that are involved in that process for us. And yeah, this kind of just shows the results of that scan. And yes, and yeah, so that's how we sort of monitor infusion across many different projects at JPL. What's some impact, the outcome of that? So this is sort of a summary. We've so far, our project is relatively new but we've been around for about a year and a half now. And we've specified about 13 different software standards and best practices that we sort of disseminated out on our website. We're still developing quite a few. We've got a lot backlogged. If this interests you, definitely check out our website, check out our GitHub, get involved with us. I think getting this even more traction in other communities is actually would be really, really great. But yeah, we've got a lot backlogged. So definitely could use help. And then yeah, number of infused best practices. So again, because we're treating as code, we can monitor for infusion. We can scan repositories and see do they have these best practices and standards that we recommend actually active? And so 671 represents of the 13 best practices that we've developed, we've sent it out to about 211 repositories at JPL so far. And we have like 671 instances of each best practice being successfully infused into a particular repository, right? So if anybody's tried to get a project to use a best practice, you'll look at that and be like, how did you do that? Because I'm still trying to convince, you know, the guy sitting in the cube next door to use, you know, to X, Y, Z. Number of pending best practices. Again, we make pull requests for these things. So we have about 252 still being reviewed and, you know, we're slowly nudging those projects. But also if they're not infused, we can reach out to those projects and be what's going on? What aspect of this is not useful to you? And they can comment in those pull requests. We can take that feedback back, potentially make another version that might be workable. And then also yeah, the types of best practices and standards most are in the governance and software life cycle areas. Okay, so actually that concludes my portion to kind of introduce this effort. I'm gonna pass it over to Kyung-Sik Yoon, who's gonna be talking about one best practice effort that's currently in development, continuous testing, best practices for Slime. Yeah, thank you for joining. Thank you for your time. My name is Kyung-Sik Yoon. I go by KS. And I'm a machine learning and data science technologist at JPL. I mainly work on machine learning, data science problems for deep space exploration and earth science applications. As on software engineering terms, maybe I should say I'm more involved in the backend algorithm development and implementation. And here I wanted to introduce and I wanted to share my story about contributing to Slime, especially in the context of continuous testing, best practice. So where do we start? So I have a science background. So in the science background, in the science mindset, if there are hundreds of problems and you try to solve everything, but you cannot solve everything. And if you solve just one thing, that's great. That's a great discovery in the science perspective. But in the engineering perspective, if you have 100 problems, you have to solve every single each item to deliver your final product. So it's a totally different mindset. And in regard to that, I have so much interest in developing new algorithms, implementing new set of functions, but I didn't like at all continuous testing or any sort of testing to make sure everything works perfectly and you have to write additional set of codes to make sure everything works as intended for your source code. So that's sort of a motivation to this. So what I would like to suggest is that you can get benefits from this slim standardization and slim effort. And also you can go look at the website and go look at the GitHub repository and you can start making contributions. So I'd like to just simply state the process of contributions. You can go to the GitHub repository and then you can go to the issue page and you can look through the open issues and take a look if there's any topic that you have interested in. Or if you have a specific topic, for example, for me, it was continuous testing. If you have a specific topic that was not covered yet, you can post a new issue ticket and you can start contributing. So that's the first step, identification of the problem. And then the next step is to fork the repository and you take that main repository to your account and then you can start making changes and start making contributions and what are the missing parts and what are the community needs and what is your specific need. And you can make contributions and then the next step is to submit a pull request and you can get the community feedback, engage with the community and then improve your repository and then finally commit to the main branch. So this is sort of a simple steps to take to make contributions and get some benefits and get some simple suggestions and then it can be a simple suggestion but it can be a big benefit for many. And through this several months of process of continuous testing part slim improvement, we recommend three aspects. The first one is a continuous testing plan template. So we spend some time to make a standardized template for continuous testing plan generation. So you can take a look again, you can make some changes or make some suggestions and it's ongoing process. So yeah, you'll be greatly appreciated. And the next one, I think it's very interesting. It's an automated test code generation from the recent development of large language models. We made some, we spent some time for testing using some existing large language models and robot framework to automate test code generation. So that can be really helpful. It can save a lot of time in terms of generating writing test code. And the next one, continuous testing process automation, the execution automation. You don't have to manually execute testing code. You take this approach to execute your testing code continuously and automatically whenever you make changes to the code or the pull request. So going back to my original pain, writing unit tests or any sort of tests can be time consuming and tedious, especially when testing large software applications with many components. So the main question would be that can we generate a test script automatically from the source code? So yeah, let's dive deep into this. There are several studies and thanks to the recent fast advancement in large language models and deep learning, there are several models out there locally and open source and other cloud based models. And there are a lot of improvement in terms of unit tests or other testing code generation. So here you can see that the manual testing, that's the human generated testing with 100% code coverage and 100% compilable actually. So that's the baseline ground truth. Compared to that, when you look at the column compilable, the auto generated code as is, it's not compilable, it's not perfect. But with some human touch, you can make it 80%, 90% compilable. So I think that's already very helpful for us to write test codes. And also when you look at the columns of test methods and test classes, the auto generated test codes, they can cover almost 80% of code coverage. So that's very helpful again. It's not perfect, but it can potentially save a lot of time writing tedious mundane test code. And you can focus on creating actual things and also working on the edge cases to make 100% perfect. So this is a simple example, simple calculator on the left side and the middle and the right code, sorry, it's really small, but one of them is generated by AI, one of them was written. And which one do you think is generated by AI? Number one. Number two. Yeah, two is generated by AI. And the point is that it's really hard to distinguish and it's virtually, they all work similarly and it's getting there. The quality is really good, especially in the simple applications. So our suggestion, we spent some time to test some large language models and other tools and our recommendation is to use large language model in combination with robot framework. Robot framework is an open source tool for automated continuous testing with pseudo code input system. And then it can generate test results using the pseudo code input. So our idea was that not directly generating the final test code using large language models, but as an intermediary, we generate robot framework pseudo code using large language model and the generated pseudo code can be easily investigated by human developers and then you can spend some time with a final human touch and then you can perform your continuous testing. So that's sort of a process, leveraging large language models in combination with robot framework. So the process is that we start from prompt. So this is an example prompt for a functional test, generate a robot framework script to perform MFA login and using the TOTP time-based the password generation code using the provided secret and generate script only. And then we can see the example running. So interestingly, so we've using mainly the local large language models with the open source models. And we can see that because our application is export controlled or many of the companies have prior proprietary information that cannot be shared with cloud-based large language models such as ChatGPT. So I think this is quite important to use the local open source models. You can potentially fine tune it and fine tune it to your specific company needs or the institution's needs. And what we have found is that the performance, the generation speed is out there and then it's improving a lot. And yeah, here it's sort of a real-time generation example using my laptop. So it's pretty fast and it can generate it really quickly and efficiently and you can execute the code and then generate the test results as well. So by using some examples of prompt generation for the robot framework pseudocode and then we can easily tackle this auto-generated test code and continuous testing script generation. And we have another page that covers list of tools and frameworks with our recommendation for running different tests. So again, this is not complete. This is maybe not perfect. So you should take a look. You get some benefits taking some examples of different applications and different use cases. You can use different tools. And as well, if you have a wonderful tool for recommendation, please do so. So we can improve the list as we go on. And finally, so lessons learned. So we have contributed so far these three parts. One continuous testing plan template and the other one, the second, automated test code generation. Third, test code execution automation. So please take a look at the website and it's ongoing. As I mentioned earlier, we are in the stage of pull requests, getting feedback and making additional changes in terms of continuous testing. So yeah, please join us in our effort to improve our model. Thank you and then I'll hand this over to John. Thank you, KS. That was wonderful information. My name is John Engelke and I am a data software generation engineer and a lead developer on an astronomy product called Citizens that is reducing data that's collected publicly by both amateur and professional astronomers to help identify exoplanets in the solar system. I've also worked on some of the Mars imaging products and that's sort of the genesis of how I came into the SLIM project. Basically, we wanted to wrangle some of the software that we had been putting together that has actually existed for quite a number of years and you can imagine how important that is when you're dealing with a product or a project like Mars 2020. So let's see, I'm talking about continuous integration. It's a little bit amorphous and everybody has an idea of what continuous integration is. A lot of times when people talk about it, they talk about continuous integration versus continuous delivery and continuous deployment. We wanted to create a clear separation here and to very strictly identify the needs that were necessary just to perform the continuous integration side. So what is continuous integration in a nutshell? It's basically taking work product, engineering product, code, scripting, et cetera, combining it together in a single place and demonstrating that the code is compilable when it works together. To understand the scope of the need at the lab, we started off using our community engagement process. We had to first talk to the community, the development community on lab, which is actually rather large. JPL is a very decentralized place and there's a lot of people working on different aspects of single projects versus working on different initiatives. But what they have in common is that they have to be reliable and reproducible. So there's a lot of developer contributions coming in and we had to find a best practice need that would actually sort of meet everybody's requirements. So at the lab, there are a lot of things like moving pieces. We've got a lot of deployments going on all the time. We need to be able to trace what's going on and follow up with any kinds of issues. So the contribution model that we adopted was community-based. So go to groups, bring people in. We're gonna meet with different stakeholders in the organization, meet with different users, different developers and see exactly what needs to be done. The point here is that an engineer who's solving a scientific problem should not need to spend all of their time figuring out how to deliver the solution to that problem. They should be able to focus on solving the actual engineering or scientific problem so that we can actually produce product that's meaningful. And so the idea was, the idea was we'll come up with a system that's reproducible that we can make work for everybody. And so our proposed best practices included creating a reference architecture and some tooling recommendations and then starter kits, which I'll get to in a moment. We did go back and forth in terms of the user feedback realm with different groups on lab. The evolution of all of this was that eventually we got to a point where we got to a point where we actually could, let me see if I can go back here for a second. Okay, yeah, so we got to a point where we could we found out that these kinds of needs that we had on lab were actual needs that we could share with the community at large outside of the lab. And so it evolved. It evolved in such a way that we could present it as an open source product. And so I think 20,000 foot level what we're trying to do here is we're actually trying to share some of the practices, a subset of the practices that are very open source oriented with the community at large and that also comport with best practices that are used at JPL. I mentioned a moment ago CI reference architecture so that's one of the first things, right? We have to actually identify CI in such a way and what CI is doing is it sort of assembling the different pieces. It's like the assembly point of different efforts like testing, code scanning, secret detection, stuff like that, compiling, delivering, releasing. And so there's a number of different stages in a good CI reference architecture but they can be broken down in a sequential order using a pipeline that basically flows directly from a developer or group of developers through a code versioning system or like CVS or Git onto a server that combines different commits together, tests things, make sure they compile, packages them, publishes them to an artifact repository and then possibly orchestrates them into place but ultimately what you're worried about are a few different ideas. You're worried about being able to have your code work when it comes from different people with different machines. You're worried about making sure that it compiles accurately, you can track that and you're also worried about being able to deliver it in a way where it's completely reproducible and locked down. Now that's important for the lab because we don't want uncertainty when we're dealing with multi-million dollar projects that are operating on different planets. One of the parts of the SLIM initiative is to identify tooling recommendations because although this might seem something very basic, the whole concept of CI is sort of like technical debt to a lot of developers. So it's a little bit fuzzy. So the tooling that's used in practice is not always absolutely clear. It's a little bit amorphous. So we wanted to identify the different tools that some users are used to using or may want to use that are reliable industry products and open source products mainly that can be leveraged or repurposed for different CI needs. We have our CI reference architecture posted online if you'd like to go take a look. It's at the SLIM website on GitHub under the NASA Amos project. I want to step back for a second here and just quickly mention the shift-left philosophy and DevSecOps. So one of the value-added benefits of continuous integration and having a solid continuous integration system is not only that it can be re-leveraged for different products, it always works the same, it's reliable, but also you can start to take some of the things that used to happen in old ways of thinking, waterfall sort of product management philosophies and sort of like shift them back toward the development side so that we can start doing things like code scanning, static code analysis, dynamic code analysis, secret scanning, linting, testing on different types of low. You can get those bundled together in a CI process so things happen closer to the developer. So it's basically using the concept, taking the concept of infrastructure as code a little bit farther and almost making it practice as code. I'll talk about that again in a moment. I'm gonna back up a second. I wanted to mention a Java starter kit. We are actually using that in the lab right now and we do have this concept of starter kits. We have a starter kit that's open source right now called the Python starter kit, but the Java starter kit is a great example of how we've been able to leverage CI practices on lab so that we could have a predictable tool chain and we could meet some of the reliability needs of the actual product. The actual tool chain we're using is sort of broken down here. We're able to do builds and delivery using a couple of different platforms that include maybe GitHub Actions or Jenkins Server. We can dockerize our applications and do testing and do the compiles in a predictable dockerized environment and then we can publish it up to an artifact repository like Nexus or Artifactory, so then it can then be retrieved by other applications that might need to use it as dependencies. You might guess that this is a picture of Gisaro Crater. So the Mars 2020 project itself, and so I worked on the telemetry side where the images were coming back. The Mars 2020 project itself actually involved a large, large team of many, many users and developers and engineers working on different aspects of similar software. It was a very big project and so a lot of the dependencies that the root of that project had to be shared so we wanted to make sure that we had a way of sort of managing things and sort of tracking and so you can see with a big project like that why you wanna understand the origins of any kind of software issues or where things came from in case something pops up. And so actually what I wanted to mention here is one of the pain points that we came up with is that software was developed, there were a lot of snapshots, people were moving very fast, wanted to be able to track things. So we embraced this concept of CI-friendly versioning where we versioned all of our products at build time with the actual product name, a semantic version, a number that related to the actual build service so you could go back and track on either GitHub or Jenkins, the exact build that happened and then actually embedded the code that get code commit hash into the actual version itself so there's a complete chain of custody for the software end to end so you know exactly where things start and exactly where things end. You see at the end if there's any kind of issue that pops up you can just walk backwards through those systems and you can find out exactly what happened and that's where it became extremely useful to the users. I'm gonna talk a little bit about a Python starter kit which is actually the more public starter kit that we have right now. If you'd like to actually come back onto the Slim website you can help contribute to our Java product in the next, very shortly, but you can also help contribute to our Python starter kit and this is sort of a full featured starter kit and the way we are working on the open source starter kit initiative is we're trying to give people a starting point where they can take an application and they can just quickly clone it and create something usable and just be up and running almost immediately. So you can, you can, you can templatize this application, you can drop it into a repo of your own name and you can just start running with it. We're using what we're calling, we're like trying to take infrastructure as code to the next level, right? So we've got testing as code, documentation as code, right? What does infrastructure as code mean? Really conceptually it means you're elevating the actual scripting that's using to do builds or to do deployments to the level of the developer so it's on the same level as development. So you're actually empowering the developers so they can help work on the deployment issues. You don't have this complete disconnect between the DevOps side and the development side. You've got everybody working together and everybody can see the same thing so there's no silos. So the same thing with documentation, right? Rishi has done quite a bit of work on preparing the documentation side of things and documentation as code, we can immediately get it put into place. It's not an afterthought, something that happens very quickly. And so the Slim Starter Kit offers all of these things and it is a gateway in many ways to shift left value added features. This is an example of one of our templates for a product. It's a readme template. You come in here when you copy the product, it's got places you can just go through this and fill in the blanks. You've got instant documentation. And a developer who may know the deep details and the deep dive of what's going on in the application can do this very quickly. Whereas if you don't have this initially, when you start building the application, things like this just build up. You start getting a bunch of technical debt and more technical debt and more technical debt. And then it's something that a lot of developers don't want to follow up on. Everybody's worried about solving the problems. Nobody wants to come back and make sure that the next person or the other team members can dig into it as much as they can. It just becomes technical debt that's a little bit difficult to follow. I don't seem to be clicking through. One example of one of the value added shift left features was software composition analysis. And this was mentioned as something, software composition analysis is actually presented in GitHub by Dependabot, but it really means evaluating the dependencies that are used in your application to make sure that they're secure. So we created scripts so that when you launch a new application from this template, you basically have this built in. This is one of the features that we have built in. We have others also that I had just mentioned. We're trying to make this, everything that we're doing on the open source public side in Slim, in the Slim starter kit for Python is basically completely traceable end to end. We're using GitHub actions to do our CI builds so everything's transparent and visible. And we're releasing directly up to the Python package index. We try to fail fast if things don't build and we also address security alerts. These were all concerns identified during our conversations regarding the needs. Lessons learned, listing team feedback early is very important. So exactly what this says, tooling should augment not rewrite existing workflows. A lot of times when someone comes in to develop a sort of template or a process like this, sometimes they're coming in with their own conception of how it should work. Their own ideas for how things should flow or work, but that doesn't work really with the boots on the ground who are actually doing the development. So it's important to take into account everybody's feedback because then you can develop something iteratively that actually takes into account what the actual users you're providing services to need. Another lesson here is that testing frequently is very important and listing feedback often is very important. So when you actually do make a product like this, a set of packaging, a templatized product that somebody can use, it helps to keep testing it and retesting it and making sure that it meets the requirements of the user base that you're preparing the product for. And so this is why Slim exists because we're trying to create an open source, solution that will allow quick implementation of CI principles and open source documentation is code principles, infrastructure is code principles, testing is code principles. And so please come and help us make this better. At this point, I'm going to turn the floor over for any questions, but I would like to take this moment to offer a special thanks to Landang who's also from JPL and helped us set up this talk. The scale team here for providing facilities and allowing us to come and present our ideas and the Slim community. And I'd like to give special thanks to a few contributors from JPL. Jordan Paddams helped quite a bit. Sean Kelly from the PDS project was quite instrumental. Kyle Barner was great in helping us get our scrub tooling working. And everybody else who's contributed has been really wonderful. And we're hoping that we can get more public participation going forward and really it's an exciting project. So please feel free to come and take a look. Questions? I'm going to hand the microphone over to Rishi. Yeah, thanks, John. Hopefully you all enjoyed and we've got a couple of questions. So, okay, we'll pass the mic. I'm going to hear you on the video if you don't have the mic though. So I'd like to come back to the question of automated pushing out of some of these standards documentation pieces because there's an inherent difference between aligning teams and groups on technical standards whether they're the data we collect for issue templates or some of the technologies we're going to use for automated testing or are we going to do automated testing versus behavior standards that are captured in things like a code of conduct and a contributing MD. And one of the concerns we've had when we push out automated behavior based standards is teams accepting the pull requests without having the team discussion around the behavior norms. And then running into situations where they're not handling things according to code of conduct because they didn't talk about it or they didn't really talk about things like contribution warranty for accepting outside contributions. So have you seen any friction around that particular end of automating the push out of behavior standards and how did you approach that at JPL? So I was very intrigued by the metrics that you guys were collecting on these projects but I was wondering if you have any data that show how the quality of the code improves once they've implemented these best practices. Do you have anything like that? First I want to say excellent presentation. I love it that I can go back to my team or multiple teams at a Fortune 500 company and say, hey, this is in the open and what they're doing is mirroring a lot of internal dialogue that I see in our DevOps and our engineering empowerment team. So I just want to give a kudos to you all in your public mission in getting up here. So thank you, number one. One that came up too is I work at Fortune 500 company and we use Dora metrics right now, kind of after putting a lot of these CI CD metrics in place. And so if you are on GitHub, you might have the ability to easily collect those and to say, hey, it might be a really interesting way to collect impact of, hey, we didn't have a deployment fail to the Mars rover or the equivalent like that. I'd love to see that. But the question I had is, I thought it was really interesting the automatic code generation with StarCoder and things like that. Number one, that's gonna open the eyes of some engineers out there that said, hey, I could never automate writing tests. That was my impression when I saw that. I'm wondering is, are there, I'm sure there's materials out there, but your code example from NASA JPL, is there any sources out there for replication or for us to take a look at? That's it. So in that vein, KS, what, first of all, I was curious you showed Lama and then you also talked about StarCoder. Which LLM are you actually using? And also, are you doing this closed loop or you're reviewing all the unit tests, testing it and then manually fixing it? Okay. So what LLM are you using? Have you tried the Lama 3 with Python? Appreciate the presentation, loved it. I'm kind of knee deep doing this stuff across multiple different projects right now. So really appreciate it. My question for you is, where you kind of envisioned this going, this project? You touched on Python and Java. I'm assuming you use that internally a lot. Is this something you want across conceivably all languages or are you kind of focusing on a subset? Yeah, yeah. Perfect, thank you. Great talk, by the way, I really, really appreciate it. This is something that I've been doing at our company as well because we do have like a very decentralized model with squads sort of off doing their own thing and using their own language and frameworks or whatever. Not really a question, but just like an idea for you guys. We use, so at our company we use Progen, which is sort of like a, I don't know if you ever heard of Yeoman. It's kind of like a templating engine. But it also does almost everything that your starter kit does. So I was curious if you guys would be interested or thought about using tools like that in order to get teams to adopt it. It is. Made by some dudes at AWS. But yeah. Testing, testing. All right, cool. All right, and it is, we're three minutes past 12.30, so I'm gonna get started here. Real quick note, I do have some swag in the back. So before you leave, grab some stickers, grab some of these paper pop-up trains. I have a ton, so feel free to grab a few if you have friends in your life who are fans of transit or any kids who would love these. They're great. So welcome to my talk, Navigating the Transit Data Landscape. How many of you took the train or bus here today? Nice, excellent, good job, love to see it. And how many of you are already familiar with transit data of any sort? Yeah, oh wow, that's a good chunk. I love to see that too. All right, well I'm gonna start off a bit with some info about myself, who am I? And it's because there's a lot of stuff I like to plug because I'm involved in a ton of things. So first and foremost, I work for LA Metro. I'm the tech lead on the digital experience team, which is part of our customer experience department. My background is as a software engineer. I am a public transit writer and I have been for, you know, my entire career, my entire adult life. I'm an urban explorer, civic tech nerd, multi-disciplinary crafter and musician and generally inquisitive person. A few of the things that I'm involved in that I want to plug since I have this platform is Maptime LA, which is a great meetup group and community for mapping enthusiasts. So we teach open source mapping tools and we contribute to open mapping projects on open street map like Open Sidewalks, which is one that we're currently working on right now. So check us out on meetup.com. Data and Donuts, which is a monthly breakfast lecture series where we highlight local government tech and data projects. And then a couple of events coming up that are annual, the LA Arts Datathon, which brings together the arts and tech communities and then the International Humanitarian Mapathon, where we teach people how to contribute to hot OSM and humanitarian mapping efforts. And then finally, as of last year, I'm also on the board of directors of mobility data and if you don't know what that is, I will get into that later on in this presentation. Just hold that in the back of your head. I need to mention it here, but I won't get into it right now. I graduated from Berkeley in Electrical Engineering and Computer Science in 2006. I spent 13 years at the County of Los Angeles doing systems analysis, web development, database work. And the past four years, I've spent at LA Metro getting a neck deep in transit data and the world of transit data and technology. So this is what I wanna share with you all today. What I do at Metro in summary, advocate for writer information needs inside the agency, improve the way that our digital information reaches writers, advance data standards in the transit industry, and pilot more sustainable tech and data practices at the agency. And all of this with varying degrees of success because it is a bureaucracy, there's a lot of barriers in place. It's an uphill battle, but this is what I do. And a few caveats here. I wanna set the framing right. I am not involved in operations. Those are the people who run our service, who determine what our service is, who make sure that all the buses and trains run as on time as possible. I'm not involved with the hardware or the data infrastructure for that matter. I'm mostly on the public facing side of things. Definitely not involved in service or planning, but I do like to ask questions. I like to learn and I like to share. Hence why I want to share everything I've learned in the past four years with you here today. And talking about transit data, what do I mean? So how many of you are familiar with what this is? Yeah, okay, you've seen this before. These are our timetables. We print them out on paper. We put them online as PDFs. This is what our schedulers and planners work on to let the public know what our service looks like. We also have real time data. So the buses and the trains have sensor technology incorporated in them that lets us know where they are. And then some stuff happens, a ton of stuff happens. It's a long process, but it ends up in your hands, in your phone, through these apps. You can see our schedule info, you can see our real time info, where these buses and trains are in real time or not in real time. Where they're supposed to be at what time, right? So yeah, so how we get from there to there and what happens in between. And there is a lot, there is so much and I worried putting together this presentation that it was too much and at the same time too little. It's really hard to really encompass all of it. So think of this as like you're dipping your toes in the water, it's a long journey and I'm gonna cover a lot. I won't be able to get super in depth into any of it really, but if there's any part that you are particularly interested in, want to hear more about, have questions, keep that in mind and ask at the end because there's so much more that I won't even be able to get to within this presentation. So our first stop generating the data. And that stops in our operations, starts in our operations department with our service planners and our schedulers. A lot of factors go into planning and scheduling and first and foremost, a note that I promise my operations people that I would let everyone know is operator is the preferred term for our drivers, bus drivers and the train drivers, they are called operators. So anywhere you see it's a operator in this presentation, that's who I'm referring to. So operator safety and experience is a super high priority for our agency and it's a big factor that goes into the planning and scheduling that we do around our route. Some of the important things to call out is they wanna make sure stops are in safe places that we minimize the possibility of accidents and that gets impacted by how long a trip is that an operator has to drive and their familiarity with the route. If we're constantly switching the route on them, changing it up, that's a recipe for more accidents. And then really, really important for them to access to essential amenities like bathrooms. You know, you're driving a bus all day long and you wanna be able to have access to bathrooms. So where are you gonna go? That needs to be planned out, that's factored in and it really dictates a lot of things. Community, of course, feedback from the community, looking at things like changing traffic patterns, commercial and residential developments, how much we expect this particular route to be used and how it connects to other existing routes in the region from other agencies. So not even just our own service. We have to communicate out to other agencies, figure out where they are planning to change their routes and then coordinate all of our service so that it makes sense on a regional level. And I know LA Metro is the big player in LA, but we have actually over 50 public transit services in LA that operate. Additionally, there are physical limitations and of course this makes sense, but I didn't even think about this until I started working at Metro. They have to choose routes where the street is actually wide enough. They have to make turns where the intersection has a big enough angle for the length of our buses and we have different length buses. And so depending on what kind of bus the intent to use on that route, whether it's a 40 footer or a long boy, they're gonna have to choose which intersection to make that turn on. Efficiency, this is of course a big one as well. A lot of planning optimizes for cost benefits and how we efficiently allocate our resources. Part of that is people run these buses. The people need to get to where the bus will operate and how do you get that person to where that bus will start? That's gonna be non-revenue, dead time, and it's not gonna be efficient if our drivers, our operators, sorry, have to travel all over the place just to even start their shift and run the bus. So there's a lot of optimization of where staff is along these routes as well. And of course we want to run the most amount of service that we can for the least amount of cost so that we can maximize how much service we offer to the public. Another important thing is near side versus far side stops. Planning likes to prioritize far side stops because you've passed the signal at that point. If you stop before the signal, then you might get stuck behind a red light. And so that's one of the efforts where they've been trying to move these near side stops to far side to make the routes more efficient. And then we have, let me see, is this the, yeah, okay, final topic category here. Jurisdiction, another huge important one, like I mentioned earlier, where bus drivers need somewhere to use the bathroom. And that depends on agreements that we have in place. It's the same with where the bus runs, where the stops are, where the buses lay over because the operators need breaks as well. We Metro do not own any of the streets in LA County even though we operate in LA County. LA County streets are owned by either the county or the cities that are in the county, all 88 cities. And so if we want to have a route run through, if we want to put a stop on the street or have a bus sit there and lay over for a while so our operator can take a break, those cities have to agree to this. And that can be a problem in some cities because some of them won't agree to this and that heavily impacts what kind of service we can provide, how we plan our service, the length of the routes and all of that, although that needs to be negotiated. So there's a lot of barriers before we even get to the data itself. But next step, after all that's figured out, we have a scheduling system. It's called Hastis. It's a product by Jiro. A lot of transit agencies use this enterprise level scheduling system and it handles a lot of things beyond just scheduling or managing our service. Of course it does the scheduling but it also manages operator assignments to routes, service cancellations because it's changing all the time. People get sick, stuff happens, we have to switch people in and out, cost optimization, operator payroll functions, and also long-term planning for the network as a whole. So all of that gets put into the Hastis system and the data from there is exported as GTFS. So who here has heard of GTFS before? All right, sweet, nice, that's a handful and I think I know who you folks are. Ha ha ha ha ha ha ha ha. And so I'm guessing you're the same ones who have worked with GTFS before. All right, so for everyone else, this is brand new, which is great because I love introducing people to new things, especially GTFS because that's my jam these days is data standards. So what is GTFS? It is a data specification. So a set of rules defining what a data set would look like for the producers and the consumers. As someone creating the data, I need to know how should I create the data, what data types, what are the possible values, and the consumers then know how to expect that data when they receive it so that they're building stuff that can use a standardized set of data. But GTFS is more than that, it is specifically a representation of transit service. It is a transit agency service in a machine readable format. So why does it exist? You could probably guess, but without this data standard, every transit agency would be publishing their data completely differently. You would have chaos, the consumers of the data would have to create custom code to account for every single possibility of every way that any transit agency would publish their service. And as I mentioned earlier, in LA County alone, we have over 50 public transit service providers. So let me take a quick detour from talking about Metro and what GTFS is and get into the origins. And it's not super technical, but I think it's really important to understand how this came about because I see GTFS as so important to the public's ability to use public transit as a service. And by understanding it, we can learn from what happened in its development and maybe replicate it in other ways. So we're gonna go back in time to 2005 and some of you might recall what Google Maps looked like back in 2005. Very different from what we use today. And it was that year that Google decided to partner with Triment, which is the transit agency up in Portland. And in this partnership, they have this goal, which I've quoted here from Viviana McHugh, one of the Triment employees who was spearheading this effort there. They wanted to make it just as easy to get transit directions as it is to get driving directions from anywhere in the world. So this happened at a really opportune moment in terms of the technological barriers that could have prevented this from happening. Triment already had their service as data in their scheduling system. It was already in tables. It could already be exported as CSV. So that machine readable data already existed in some format. It didn't need to be created from analog. That data was already, yes, able to be exported. So that was like a huge barrier already in the world of government and back in 2005 at that. In terms of business barriers, Triment as an agency already supported open source and open data. They are already publishing open data feeds used by outside developers. Their procurement policies already accounted for open source. So they already had open source friendly procurement policies for a decade at that point. And if you ever hear me talk about government technology work, that's such a huge thing. Procurement is such a huge, huge deal. And then they were able to find a willing partner in private industry with Google Maps. So all of these really led into their success. And it was a success. By the end of the year in 2005, Google Transit launched. Here's a screenshot that I found on the internet. It went live and they found that it was a huge success. Hits to the site were really huge, even by Google standards. And this was the beginnings of GTFS. So at that point, GTFS stood for Google Transit Feed Specification. Because of Google's involvement. And Google Transit expands in 2007. More cities, more transit agencies, are recognizing the benefits of publishing GTFS because then Google Transit can show their service, people can get directions. And this is a huge need for transit riders. At the same time I want to call out, 2007 to 2009 is when we see the first iPhone launch, first Android launch, and 4G starts to spread. Some of the key early decisions that they made with this early version of GTFS was to keep the CSV format that was published by the scheduling system. Because they wanted to keep it as simple as possible to make sure that smaller, less-resourced transit agencies could still publish this data without any proprietary technology or complicated data standard needs that they need specialized people in order to produce. They also chose to keep the data specification open and free and make the resulting data free and publicly accessible. And all of this is super, super important. So in 2010, a major shift happens. We get rid of the Google and we call it general. So it's still GTFS, but now it is the general transit feed specification. And this is because GTFS, it is growing, it's becoming widely popular, but people are concerned that, you know, is this just promoting or benefiting Google products? They wanna grow the ecosystem, but they wanna recognize that, no, this is being used outside of Google products and we want the ecosystem to grow outside of Google products. So the name is officially changed to general transit feed specification. Another thing to call out around this time, as it's getting really popular, development starts to splinter off, you know, as different transit agencies have different needs, they're creating their own flavors of this and putting out their own versions of GTFS. And this is starting to become a problem. And so keep that in the back of your head as we move forward to 2011, the GTFS real-time extension is released. So remember, we talked about GTFS, OG, it's all based on the scheduling system. That's all just planned service, it's static data, you know, it doesn't change, but we have this huge breakthrough in 2011 where Google and a group of transit agencies partner up to define an extension to GTFS for real-time data based on real-time GPS location tracking. And then jump forward to 2015, we start to see some consolidation efforts in the industry around transit data. The Rocky Mountain Institute hosted a two-day workshop on transit data interoperability. They put out a recommendation to great best practices to grow this space. And so in 2017, we see those best practices published. There is a working group that is convened and this work continues as they host another two-day workshop to identify more challenges and opportunities to improve transit data. And so we jump forward to 2019, remember I mentioned mobility data, this is when mobility data is established. Now, I don't know about you guys, but the past four years have been a blur for me and it was kind of like a shock when I was doing my work and I knew about GTFS and how it's integral to our service and how people write our system. And mobility data was only established in 2019, so like the year before the pandemic hit. But this is important because mobility data is basically this nonprofit that was established to oversee the continued growth of GTFS. Before we had a lot of splintering off of different teams, different groups doing their own thing and they recognized that was a problem in these working groups that they convened. And that working group effort led into the creation of this nonprofit to kind of consolidate it all to really enforce a governance process to see GTFS grow but make sure it's done in a way that the entire industry can agree with and will benefit everyone. So fast forward to today, GTFS is used by over 10,000 agencies in over 100 countries. It's an international data standard and mobility data themselves, their work has grown as well. So their work now covers not just GTFS, which is for public transit data. It also covers data platforms and tools around GTFS as well as shared mobility data. So I won't really get into this but GBFS is the general bike feed specification for bike share and they also help oversee that and its growth. So next stop, GTFS schedule. So this is the original version of GTFS that I mentioned that gets exported from scheduling systems. It's that planned service information. Metro releases bus and rail GTFS as separate data sets. And this is for a variety of reasons, but if you want to use our data, just note, you're gonna find them in separate data sets. And they get updated at different frequencies as well. So our rail is updated every weekday night in an automated process. Our bus service is updated during what we call shakeups and those are usually twice a year in June and December and then we roll out minor updates on a weekly basis. So a little bit about the shakeups. This is an operational thing and a lot of this stuff revolves around how we operate our service. So shakeups are when major service changes happen. It's major changes to routes and stops. This is also the bus operator's chance to bid on which routes they're gonna drive and often it's based on their seniority, how long they've been at Metro. They can bid on the choicier routes and leave the ones that people don't like for the newer folk. But this is why major changes don't happen outside of the shakeups. So what does GTFS schedule data look like? Again, you know, because they wanted to keep it simple. It is a zip file and inside the zip file is a series of text files, right? These ones on the left are the minimum required ones but because it's grown and expanded a lot, you can see on the right, like there's so many more that could be included. So not everyone, not nearly close to everyone publishes the full set of data but there's a lot encompassed within GTFS of how you could represent a transit service. So I'll go over them real quick. Hopefully it's not too small but I'm not gonna get into super nitty gritty details of this, I'm just giving you a high level overview. So agency.txt, this is, it tells you what agencies are in this particular GTFS dataset. Oh, one thing to note too, each of these files, they are CSV and you can think of them as tables, tables in a relational database. They have IDs that serve as primary, they have primary keys and they have foreign keys so those will come up, I'll call them out over here. So you have agency ID in here, you have routes. So this is a list of all of the routes that are included in this GTFS dataset. So the data I'm pulling here is actually from our current Metro bus GTFS. And then let's see stops, list of stops, pretty simple, straightforward. One thing to note is that in the stops.txt, it doesn't, at least for us at Metro, it doesn't represent all possible stops, just the active ones that are represented within this GTFS dataset. Since we release it six months at a time, it only represents this six months of service. Next six months, service might change, bus stops might change. Calendar.txt, this tells you the date that service runs. We separate our service into weekdays, Saturday and Sunday. And then trips.txt, it starts to get a little bit more complicated now. A trip is a combination of at least two stops during a specific time period, traveled by a single vehicle. So that's a little complicated, but you'll see on stop times, hopefully that will make a little bit more sense. So stop times tells you the times that a vehicle arrives at and departs from for each stop on each trip. And so you see in the second column, we've got arrival times, departure times, and then stop IDs, it's not really human readable there, but those are all the different stops along that route. And you see stop sequence, which tells you the order that the bus is traveling. And then we've got a stop head sign, which is supposed to represent what the bus shows on its head sign, that display at the front of the bus. And this file is over 200 megabytes big, unzipped for our bus GTFS, because we have a lot of bus service. And then just the quick little diagram showing you the relationship between these files and the various IDs. And now I'll get into, oh, sorry, there you go. A few things to be aware of, because data standards are great and I love them and I plug them, but they don't account for operational reality all the time, and you may not realize how or why. So some gotchas to be aware of if you're planning to use this data. So within stop times.txt, there are these two fields called pickup type and drop off type. And that indicates whether or not a passenger can be picked up or dropped off at the stop. So you might look at stop times and see this list of stops and think, okay, the bus serves all of these stops. Not completely true, you got to check these fields and make sure that they are both zero to be able to say, yes, as a writer, I can get on a bus here or I can get off a bus here. And this could, and because the data in here reflects our schedule service, we may include stops in there that are layovers or terminals that passengers are not allowed to get on or get off a bus out. So that's why it happens. Another thing in stops.txt, easy enough, it's a list of stops. Well, if we look at our rail, our rail stops are actually these stations, right? And so there's this whole concept of like these, what are they called? Parent children stops because there's a lot of information to represent at a station in terms of it being a stop. You have it as a stop, you know, and so location type, you have stop or platform, and then you have station and you also have entrances and exits. And this is important, especially when we have like really big stations where you may get in on one street and then get out like four blocks later, I don't know if it's four blocks, but at least maybe like two or three blocks, you know, further away, some of our stations can be pretty big. So the data actually represents the different entrances. And you can see that in the stop name here for seven street and metro center. The name includes the different entrances as well as the elevators. And they are related to the parent stations stop ID with that parent station column in the far right. But one thing to note for metros data, which is a problem that I am trying to bring up with our operations people is that we currently do not have a separate data points for each side of the platform. What we have is a single point for each platform. So this seven street metro center, you can see the very first row and the very last row. We have the platforms for the A and E lines as well as the B and D lines, but you can't distinguish between, you know, each side of the platform. And that's a problem when you want to use GTFS to navigate. So one thing to note about metros bus GTFS with our IDs is that our IDs change. They're not persistent between shakeups for route ID and trip ID. And that can be a problem if you're expecting, oh, well route ID, yeah, like that's the ID for the route ID that represents that route shouldn't change. Well, no, not quite. So you'll see it's actually kind of split in two parts. The first part is a number. That's kind of like the route number. The second part after the dash, that is the hastis version number, hastis being our scheduling system. So each time they make changes to service, the hastis system will generate a new version number for tracking and that gets exported with our route ID. And it is annoying, but it can be useful to see like, oh, you know, is this actually a new data set from this other one I was using or what? And similarly with our, within trips.txt for trip ID, we have the first part, which I'll get into in a bit, but that second part reflects that shakeup date. So December 2023, that's when this current version of our GTFS was released. So that will change with each shakeup. So then for that first part of the trip ID, it was not persistent before, but within the past few years, because so many of our consumers identified it as a problem, our operations felt forced to create a format for a permanent trip ID. So that first portion should actually be persistent between shakeups. I haven't tested this out, so I can't confirm for you what percentage of the time it persists actually, but this is how it's broken down. I won't get into it, but they have a whole system for defining what that code looks like. And then we have parent-child routes. So some routes that you may take and may be familiar with may not have their own dedicated route ID. This is one of the most confusing things that I had to wrap my brain around at Metro, and it's like one of the most confusing things that we've had our data consumers complain to us about, but there are many reasons why it is the case. So just to give an explanation of what this is, let me see, do I have it in here? Ah, okay, I didn't, ah, I have to figure this out. But anyway, why would you do parent-child routes? Let me back up a bit. I don't know why I ordered the slides this way, but okay, one thing that indicates a parent-child route is that in the stop times file, that stop head sign I mentioned, what shows up at the front for a bus, you'll see in the middle of a trip it changes. It says 48 mil rows and then bam, suddenly it says 10 mil rows. And now instead of being on the 48 bus, you're actually on the 10 bus, surprise. And so in the data, you'll see that reflected a bit. When you use a single trip ID for these routes that have parent-child routes, it may not always be the same route number throughout that entire trip. And we also have a non-standard route code and destination code field. So these are not part of the GTFS standard, but okay, so you've got trip ID there and you can see that they're the same for all of these rows. You've got stop sequence. So these are all sequential stops, one stop after the other. And then we've got stop head sign, like I said, the four changes between 43 and 44. And then we throw in these additional fields, route code and destination code, which is just stop head sign broken up. So you can actually reference route code to get at what the actual number of the bus is. So why do we do this? So operationally, you know, it goes back to some of those service planning things that I mentioned at the very beginning. Some of our trips, they change route mid-trip. They change the route number mid-trip to avoid ending that trip where the route changes. So if we set it up where, you know, it's the 48 and then bam, and then suddenly it's a new trip where the 10 starts, we are obligated to have a layover there and provide access to a bathroom. This is like a technical definition thing, you know, what a single trip is. So we stretch the definition of a trip. So the trip is not necessarily tied to the number itself. The trip is just a sequence of stops and we can change the route number, you know, in between, but if we ended that trip right there, we have to have a layover there and we need access to a bathroom there. And again, like I mentioned before, some cities won't agree to let us layover on their street. So we cannot end that trip there. We try to negotiate contractual agreements with local businesses so that our operators can use their bathrooms, but sometimes we can't find one that will allow them to use their bathroom. So we can't end a trip there. Sometimes that means we have to build our own bathrooms, but this impacts how we design our service and that shows in the data and it's something you have to account for when you're looking at this data. All right, next stop, real time. This is fun stuff. So how many of you use Google Maps or Transit App to plan transit trips? Yeah, yeah, awesome. So you might be familiar with a screen like this. This is Transit App. And on the left side here, I have got the three different categories of real time data that makes up GTFS real time. So that is vehicle location, which is just live vehicle locations of the vehicles, like it says, like literally GPS coordinates. You've got trip updates. So these are predictions. How far away is that vehicle from this stop? Are there delays or the cancellations? And then finally we have alerts, which is a feed that provides a description of a service disruption. So I pointed out on the screenshots here what these categories of real time data translate to, vehicle location. When you see like a little icon of a bus or train moving on a map, that's pulling in vehicle location data. When you see prediction times, like oh, at this Western century stop, the bus is zero minutes away. The next one is 24 minutes away. These are all algorithmically created arrival predictions from trip updates caveat if they show you the little wifi symbol there. So some of them will have that visual indicator that tell you the prediction is based on real time data. Some of them will, if they don't get that real time data because there's technological issues, maybe the data is down or they're not seeing it for this bus, they may show you scheduled data and some apps will indicate to you that actually that prediction is just based off of the schedule. It's not actually based on real time situation. So that's something to be aware of when you're using these apps. It's not always based on real time data. And then finally, service alerts. Usually that's like a paragraph. Okay, all right. So that's like a description of like what's going on disrupting this route. All right, time for another detour. Real time tracking for bus. So this is some background of going to the technology that we actually use on our buses and trains. So on our buses, we actually have two systems that give us location, ATMS, which stands for Advanced Transportation Management System. This is like a whole suite of smart technologies that include AVL, standard transit industry term, for automatic vehicle location. Now our ATMS system transmits this location data to us via radio and thus the updates come every two to three minutes and it was fully deployed back in 2006. So we're talking almost 20 years old at this point. We have thankfully a much newer system that was installed, which is our Wi-Fi router. So on the buses, maybe you've connected to the Wi-Fi that's on the buses before. So they'll transmit data to us via cell towers and those updates are every two to five seconds and those were only fully deployed back in 2021. So that's when we really started to ramp up real-time vehicle location possibilities with our bus system. Now for rail, we've got something completely different, track circuits. How many of you are familiar with track circuits? Nice, okay, all right, more than I expected, which is cool. Track circuits, it's literally an electrical circuit in the tracks and if the electricity is running through that tells us no train, we're good. Once the train enters that track because it's got the wheels and the axle that connect the two sides, bam, that circuit's shorted, now we know, yes, train, there's a train in that segment. So where we know the train is based on that segment of track, depending on how big that track circuit is, we know that train is somewhere in there, we don't know where, we just know at what moment it entered that segment and then once we detect, oh, hey, it's in this other segment, now we know it's left that segment, now it's in this one. So it is real-time, but it's not quite GPS location and that, it's sketchy. We're working on having more precise location methods, but it's a pilot project. What I've heard in asking operations about this is that all of the equipment that gets installed on these trains, they have to be federally approved, they have to pass standards, they don't want non-standard equipment getting in there, being a security issue, being a risk, and so that includes GPS location tracking and so it's been a problem because the standards have not moved as quickly as our technology has in other parts of our lives. So there is gonna be a pilot project, it's still gonna be a few years. Oh, and then here, just a little photo for you of the SCADA system, which is what the whole rail control monitoring system is called and you can see here, I don't know if you can see from where you are, but this representation of our tracks, it shows these blocks and that's because of the track circuits. It just tells you where in that block on the track the train is. All right, so little technical detail on GTFS real-time. The format it uses is protobuf, which stands for protocol buffer and it's a binary format created by Google to be more efficient when transferring lots of data and we're dealing with a ton of data when we're talking about real-time. So not as user-friendly as CFV, but a lot of times our real-time providers will provide the data in JSON as well, so that's nice. So here's an example of that real-time data as JSON and you'll see sometimes that Wi-Fi routers are, so Wi-Fi routers will send a location as long as the bus is powered. So sometimes you will see like that top segment, a location without a trip tied to it and that's a problem because you don't know what route that bus is running, if it's running one, you don't know like where it's supposed to be going and sometimes that's because it's traveling between trips, it's not actually serving bus riders or traveling to a stop. It could just be waiting between trips. Let's see, vehicle could be replacing another vehicle and the bus operations control center maybe didn't reassign it properly or that ATMS system might be down and that ATMS system is what tells us what route is this bus supposed to be running, what trip is this bus running. Let's see, all right, end of the line, finally nearing the end, sorry, I know this has been a lot. Where do we go from here? More GTFS, more, more, more GTFS. It is constantly growing. There is an amendment process for the specifications. You can check them out on the websites. It's really well documented. There is a whole governance process around it, broadly speaking, if you want to put out a new extension to GTFS it requires one data producer, one data consumer, a production implementation and then a series of public votes on GitHub. So it's all done in the open, which is great. And here's a few examples of recent and ongoing extensions. So trip modifications, what this is is detour shapes and GTFS real time. Because schedules has been out for so long, like that's pretty good, recovered with plan service, a lot of what we're dealing with now in terms of the data representation is when we deviate from plan service, disruptions that happen, bus has to take a detour, how do we communicate that? Trip modifications is one of the first steps towards that, voting closed just a few days ago on the 7th, so this is happening in real time, guys. And you can check out the poll request. You can see all the players involved. This is a partnership between Transit App and Swiftly. And the three screenshots you see on the side are the production implementations with the Transit agencies that they partnered with. So locally, actually, Torrents Transit was one of the pilot projects. So if you take Torrents Transit and you use Transit App, you will theoretically see their detours in the app. So check that out. We also have Flex, which is an extension for non-fixed route on-demand or flexible transit. And so this is like rideshare when you try to schedule a vehicle and it comes and picks you up. Yes, like Metro Micro, it is most often used in rural areas. And actually for Metro Micro, we don't actually use Flex. We use a different standard that's called GeoFS Light, which I won't get into, but that's the one that Transit App takes in. And we have fares v2, which is how to represent fares. And there was a really basic implementation before in 2022, the v2 version two, which advances that implementation, got passed, which is great. But it still doesn't have the capability of representing fare capping, which is what Metro has moved to. So we're still waiting on that. Mobility data themselves, they're hiring, check them out. They are also home to more data interoperability things, I guess, specifications, but also like there's a big push there for transit data interoperability in general, beyond just GTFS. They are hosting an international summit later this year and you can follow along with what they're doing on the newsletter. You can get involved on GitHub and they have a Slack where you can ask questions as well if you need help on how to use some of the specification. I do wanna call out CalITP, which is a project out of Caltrans. They've been a huge partner for us and they are really doing a lot of work to make it easier to take public transit across the entire state, including around contactless payments, benefits, and making sure all the transit agencies are outputting GTFS. And then we've got mobility data interoperability principles which I won't get into too much, but it's this effort that was started by CalITP and it's basically this declaration that our industry needs interoperable data to advance. We need transit agencies and we need vendors to buy into all of this and mobility data has become the new home and is going to continue the work. What is Metro doing? Metro is a partner with CalITP so we've been supporting their efforts in the LA region. We're also a co-author on these interoperability principles. We are actively looking for partnership opportunities on various extensions and we try to participate actively on the voting for the new extensions as well. We also internally have a mobile app consolidation working group and through that effort, we are pushing for more GTFS schedule and real-time data to exist on a regional level. So it's not just Metro, it's not just Big Blue Bus or Long Beach Transit, the big players providing real-time data. We can get a lot of the smaller transit services in smaller cities to show up with real-time data as well and we are actively advocating for an expanding data standards in the industry. A bunch of more resources for joining here. I'll post the slides online and you can get out the links there so hopefully you don't have to worry too much about catching all of these links but there's a bunch of tools, there's a whole ecosystem that exists out there, a lot of projects and apps that use data. Shout out to Catenary Maps over here and Metro's data as well. We have sites where it's available and finally thank you to some of my colleagues in operations as well as that mobility data for a lot of the information as I was able to share with you. This is my cat muffin. This is my info in my LinkedIn. Feel free to reach out to me. I wish I had more time for questions but yeah, that's it. Okay, sweet. Would you have time for questions? Yeah. Hi, really cool talk. Quick question. If I wanted to grab all of the data for LA Metro and all 50 partners in one place, does that exist or do I have to go scrape 50 people to do it? Oh, so I don't think you can get it as a single download. No, just a single source. Just a single source? Like a single place. Yes, so that was part of the stuff I rushed through so I will go back. Okay, so we have GTFS catalogs. I would say if you're focused on LA area, well, let's see. Actually, I don't even know off the top of my head which would be the best source because they kind of aggregate at a much higher level so maybe we could create a source that's just LA County but CalITP, they actually have these monthly quality reports and I'll open this up so that you can take a look at this. So what's really cool about their site and what they're doing is that they are taking in people's transit agencies GTFS and they're indicating whether that agency has a real time feed, they're giving you high level information about the service provided by the agency and information about technology vendors if they have it, they're monitoring it and then down here they are also checking to see if it follows the transit data guidelines that they've produced to make sure that these GTFS data feeds are following best practices. So you can see here a whole compliance checklist at the bottom and there are very high level things at this point but they do have this available and there are other databases as well, Mobility Data Hosts one that they're building out. There's a company called Interline that runs transit land which is also an international database of GTFS. Thank you. Is there a push to organize GTFS for algorithms because the way it's implemented requires you to go through every single element? Do you know any movements for that? I don't know off the top of my head except some of these projects probably have some sort of implementation to address that. So one bus away, open trip planner, they all take in GTFS and are ways to provide back that information to riders in a user-friendly way. I haven't been involved on that technical of a level so I don't really know for sure. Thank you. Great talk, Nina. Love to hear the connect, bringing open source to open source conference, a lot of the work you're doing in government and clearly sharing that out. So question as a frequent writer often in LA County, sometimes I notice like buses come late or things like that and so what I thought was really interesting here is the real-time data that's been developed according to schedule. So how does that play out with real-time in the city? So it's like, hey, this bus is late or hey, we have this happening where we're gonna allow to send more buses or trains to that. How is this used in real-time to instead of after the fact or after the day for performance reports in real-time to ensure better quality of writing? Absolutely, so that gets into some stuff that I didn't even get a chance to talk about but I get to show you guys because lucky for you I have access. So our real-time predictions vendor is this company called Swiftly right now and they provide us with this dashboard that we can use to see what's going on in real-time and not only that, it is something that our, oops, sorry, bus operator, I'm sorry, bus operations control center, our BOC. We call it the BOC and the ROC, bus operations, rail operations, but they're able to input service disruptions into the system and that gets incorporated into the data going out because Swiftly is the one who publishes our real-time data. So you can see. Is that manual or is that manual? So some of it is manual, some of it is automated. We do have cancellations that get put into the scheduling system that gets exported as a data feed that Swiftly will take in. So they'll automatically pull those in as adjustments which you can kind of see here. Items that say they're created by Active Batch but then some of it is manually created, probably like this closed stop, which is, yeah, construction. Yeah, yeah, there's start end times and then the moment it's in here, the Swiftly's algorithms will know to account for that and make adjustments to the prediction times that go out to the public and to all of the apps that take in this data. And actually that's another great thing to call out too. So you might notice differences in the way that like Google Maps or Transit App or whoever shows you prediction times, the like how many minutes away is this bus from the stop. And on a basic level, they are all receiving the same data. It's all data that's coming from us through Swiftly. They may do stuff to it. Like I don't know, Google Maps may add in like traffic stuff to in their algorithm to modify it or whatever. They may do their own changes based on historical patterns that they've seen but on a base level, all that location data comes from us. There is no separate other magical source where they're getting where the buses and trains are. All right, thank you. Oh yeah. That's a great question. I don't know that there is, and I'm trying to think of those. You need to be here. Yeah, yeah. I think the main problem would be like, maybe who would own that I suppose. But I think. There's a lot of issues around that. Yeah, yeah, absolutely. And I know I've talked to other people who are interested in that because of the real need. If anybody would be aware, you might be aware. I think I've heard of efforts to create that as a data set actually through my work with Maptime and OpenStreetMap mapping. I'll just leave you there. All right, thank you. Is there really, because what you talked about is. Tess, Tess. Hi. Yes, that's it. Welcome everyone and thanks for joining. My name is Lodak Lauf. I will be talking about voting and participatory budgeting. I'll explain to you what all that means in a bit. But first of all, who here is familiar with the concept of participatory budgeting? Few people. Anyone participatory democracy? Anyone involved with cities? I'm just trying to gauge a little bit like what the room is like. Yeah, so I work at the Stanford Deliberative Democracy Lab. We work on different versions of participatory democracy and engaging people in the democratic process through less conventional means. And I have done my PhD and do still joint work with Ashish Goelz Lab, the crowdsourced democracy team on the other side of campus in engineering. Today I will be talking about participatory budgeting, but we also have a number of other projects and I will be happy to talk about those afterwards. Let's see how much time there's left over. Just as a little bit of background so that you understand where I'm coming from, there's a few different topics that I've been working on in the past years. One of them is structured conversations for an open-ended consultation phase, really where you're trying to, if you're a city manager, if you're a government, you want to engage people in open-ended questions, how would you do that? So you can use, so we built a tool where you can use video chat and you can do that at scale so you can bring people together, put them in small video chat rooms and they can go through a structured agenda and arrive at some kind of conclusion. A second tool that has been built in our group over the past years is for participatory budgeting, which is really focused on the voting phase, which is where cities can use to engage their residents in a budgeting process. And third, we're still working on different tools to get feedback from residents on city budgeting processes in general. So if the city wants to reevaluate like how they're spending their budget across different departments, how would you do that? Today I'll be focusing on participatory budgeting, but as I mentioned, I'm happy to talk a little bit more about the others as well. And later I'll quickly touch a little bit on some learnings that we had from the city budgeting feedback processes. So first I will be talking a little bit about what participatory budgeting is, what you can do with it, what to expect. I will be explaining a little bit what the standard participatory budgeting platform is. And then I will be going a little bit into the different voting methods that you can use in this kind of setup and how they compare with each other, some of the research findings that we have come to in our research group. And I will be talking about different budget feedback processes that we have run over in the years. And maybe we can talk a little bit about what else is possible. I'd like to, I think it's safe to say that trust in politics and politicians is not at the highest point ever at this point. Whether it's in the United States or whether it's globally, I think that is an understatement. And while people may have doubts about elections, they seem to be still willing to be engaged in the decision process. They're definitely interested in affecting the outcome, in giving their input. Even when people complain, they do care. And in a way, interest in local politics, for example, has never been so high. We see that, for example, a lot in school boards, but often that is on a very specific topic. And they usually have a very specific reason to come there and not always with a very constructive approach. And this involvement does not have to be limited as they show, but as we will also see later, to electing your favorite candidates. When a community gets involved in decisions, that gives an opportunity to increase transparency and accountability at the government. So hopefully it can also help improve the quality of the decisions in the process. I'm currently a post-doctoral researcher at the Delivered Democracy Lab. And I'm also part of the Crowdsverse Democracy team, where I did my PhD. And this team approaches this problem in a multidisciplinary fashion. So they create these theoretical models based on applied mathematics. And then they try to implement those in more like applied algorithms. And they try to then understand like, how would we implement that in software? And how can we actually implement that software then with partners in cities to see how actual people behave? Because a lot of these applied mathematics, as some of you may know, usually make a lot of assumptions about people who are participating in it. For example, you are all-knowing, fully informed, totally rational, which sounds great and it's really necessary if you wanna do any mathematics on it, but it doesn't necessarily hold. So you really need to do some research to make sure that whatever is the outcome of such a pipeline, it actually makes a lot of sense. So that is where I was involved. We focus our efforts, especially on the more complex questions. So things that go beyond yes or no or A or B, like what we've basically been doing for a few thousand years now. And this is where I think technology really has an opportunity to make a difference and engage people in the process is better. We're aiming at solutions that are fairly representing the stakeholders that are efficient in use and most importantly, that can actually be used by regular people. It turns out that this is often the case in local questions. It turns out to be really hard to make a distinction between should be by a submarine or an air carrier, like defense decisions are really hard to make for regular people, but if you have to make a decision, whether you want to improve the playground or whether you want to invest more money in the library system, that is something that people can actually relate to if you want to fill potholes or if you want to put decoration on the walls. Those are decisions that people can actually have an opinion on and can be constructed with. Therefore, we collaborate with local governments, especially and sometimes with NGOs by providing them with a platform that they can use to actually engage with these people. This, our participatory budgeting platform is one of these and it's currently being deployed in many cities. So, we really value democratic participation in our democracies highly and we usually have to limit our questions to questions that are really easy to yield answers. Choices yes or no, Republican or Democrat, but even if you consider third party candidates, the complexity really pills compares to open-ended questions, to budgeting questions to questions that a decision maker is actually faced with when they have, we're trying to engage their stakeholders. It's much harder to ask complex policy questions and it probably is probably clearly the problem. The number of possible answers to a single question of how should you spend a certain amount of budget is so much larger and it's even impossible to write out all the different answers that you could give to that. So, we will need to structure these problems a little bit more in a way that people can actually answer them in a constructive way and that somehow becomes tractable. So, natural category of problems that has some structure that we can use is participatory budgeting or budgeting in general. But in budgeting we have money that really helps us to structure the problem. We have a single dollar amount which has a clear defined value. And within that category of budgeting there's participatory budgeting which has many different definitions. But in general we have a fixed part of money, we have a group of people that tries to come up with a preferred set of ideas, of proposals, of projects that they want to spend the money on. So, this call for engagement is not really new, right? Like to get people engaged in the government that's something that has been, people have been calling for for many decades if not much longer. It is often even mandated to some extent in cities by federal law, but it can be as limited as, for example, public hearing on Thursday evening at city hall where nobody really shows up or where everybody shows up and gets to speak for one minute. It can be all the way from information providing where people may be able to speak but not really influence the process all the way to like actually inclusive participation in, for example, some smaller communities. But it all starts from the principle that people on the streets have some unique insights, they have some unique information to contribute to the process that really could help the decision makers to make better decisions. They like to be engaged and why not ask them directly in some way? How can we optimally tap into their understanding of their own neighborhood? So participatory budgeting is probably on the more empowering side of that spectrum all the way from like information providing to actually engaging people in the actual question. And the definitions of participatory budgeting range all the way from like national, from informal surveys where we're just asking people what is your opinion? To processes where residents actually get to propose spending items and actually decide on how their budget is being spent. The term is initially introduced in South America in the 1980s. Porto Alegre is often mentioned as the place where it started. And since then it's spread all over the world with all these different definitions, different versions of it. And they usually have some kind of allocating budget. So they have some pot of money to divide. It is essential that the stakeholders are of course participating in one way or another otherwise you should just call it budgeting. And they are usually involved to some extent in not just the development of like what should be the projects but also like how to actually the money should be divided between those projects. So in the United States, a ward, a city, a district usually makes a budget available. Like for example, like a few hundred thousand, a million. That's kind of the range that it often operates on. And they might make the budget available and invite people to actually come up with a proposal. They do that in some kind of committee hearings or they do that in like town hall meetings. And then they go through a public process where people are encouraged to develop those proposals and rigorously go through them and make them a better quality. And then they're often vetted by experts so that they're up to standards of the budgeting in the city itself to make sure that the costs are realistic, that they're workable, that they're executable. And then finally when there is a ballot with viable projects that people can actually decide between, it's being put to a vote. And that is, and often there's a rubber stamping step after that necessary because not all states allow you to make these decisions formally through a vote like this. But the idea is of course that the chosen projects actually get implemented. Cities have been organizing this process on paper and in person for quite a while now. I think the city of Chicago was one of the first to do this in their works. Proposals are often collected at town hall meetings, improved in subcommittees, but usually the voting happened on paper. And as you can imagine, nowadays, this also is being considered to go digital, especially in the last few years, this really took a flight. So our platform, for example, which allows you to do this digitally, has been used by Chicago for quite a few years now. And initially that started as digital voting boots on polling stations. So where you basically put the laptop out and people can vote that way. Recently it has been more and more commonplace. And cities can, for example, choose a digital platform if they want to reach more young voters, if they want to make it easier for people to participate, if you want to reach different demographics, or if you want to engage them in a more complex way, in a more creative way through the voting process. Many cities end up combining paper and digital voting in one way or another. Every step that removes a barrier to voting helps because the turnout in these elections is often very low. So it's really about the engagement. But I think if you have a process where you engage like five to 10% of the people of the population that's actually quite a high turnout in many of these cities. Because it's not happening during regular elections. It's happening as an off-cycle process. And getting so many people to show up for something is quite a lot of effort. So some interesting tidbits that I found in the academic literature about this process is that, for example, analysis from New York suggests that if you organize this process and you engage traditionally underrepresented minorities or underrepresented groups in general in this process, it increases the likelihood that they also actually participate in regular elections. We see that if people engage in a budget exercise like this, that it helps improve their understanding of budgetary issues in general. And turnout is often low. So we see, like I mentioned, like a five to 10% turnout is actually quite a nice result. So if you want to, if you're interested in organizing PV, there's a lot of information online. For example, a participatory budgeting from PVP has a lot of information. People-powered has a lot of information. You can always shoot me, Nemo, and I have to connect you. But today, we will be mostly talking about the voting process and all the voting methods that are out there. So as I mentioned, you can use online technology for this. But for that, it's important to quickly identify like there's many phases in this process and you can, of course, use this technology also in different ways. So you could, for example, decide to do the idea collection online. Or you could have your online meetings online, just Zoom or Google Hangouts or like whatever, a big blue button or like I'm not sure what your city likes to use. You could use some kind of asynchronous solution where you let people improve on the concepts on the proposals. But I think here we'll be talking mostly about voting, which is an interesting process. And voting is actually split into a few different phases. If you really dig deep into the definition, we're talking about a few different things here. So we're talking about elicitation, which is where that is the question that we ask people, the ballot that they fill out. And then secondly, there's consideration, which is like the information that people are considering in front of them. Like what is on the ballot? What are the descriptions that you write out? What is the extra information you give? What are the running pages, et cetera? And then there's aggregation. Like given that you ask people a certain question, there's many ways sometimes to combine that information to arrive at a certain outcome. So for example, you could think that if you just ask people to check their favorite project, there's only one way to arrive at a solution, but there's actually multiple algorithms that could let you arrive at like what would then actually be the project that you prefer. So we'll split those out a little bit. I'm gonna ignore consideration for now because that speaks probably for itself. Like what information you put on the ballot is of course affects how people judge projects. Like do you put images there or not, for example. The R platform has been in use since 2012 and allows you to organize these processes, the voting part of it digitally, but also allows you to enter paper ballots into the system. It's currently used in a number of different cities. For example, Chicago, Cambridge, Vallejo. And we've partnered over the years probably with about 50, 70 different districts, wards, cities, depending on how you define them exactly with local governments and help them distribute over about $100 million across these different places. Our goal really is to make it easier for these cities to organize it. The software that we made is open source and it's available on GitHub. They can install it themselves and run it on their server. But we also made a version of that available on our own server so that they just send me an email, I spin up a ballot for them and they can just run with it. We're trying to make it as easy as possible so that cities at least don't have the excuse like the voting is too complicated if they want to run a process like this. If you want to play around with it, feel free to do so, pbstanford.org has a number of sample ballots that is just live where you can just play with it and just see what are the different voting methods that are available, what are the different interfaces, at least on the user-facing side. I will also take a few minutes to just show you at least a few of the different voting methods and what their screenshots look like. On the platform, the city basically gets to determine what goes on the ballot and what the ballot actually looks like. So the city, for example, gets to determine what the voting method is, what languages they want the ballot to be available in. They can add all the translations, they can decide how they want to authenticate voters and make sure that they are actually eligible to vote. They can register voters to the level of detail that they want. Some cities prefer to have a very thorough way to authenticate after the fact like, oh, we only want people who live in certain streets or certain addresses and we want to be able to check that. Other cities, they put more value on making it really easy for people to participate. So sometimes cities don't actually know who lives in their city, so then the only way to do it is basically some very basic authentication or they just use text messages to make sure at least one phone, one vote, which limits it to some extent, the abuse. Oh. And then they can set some different constraints for the ballot and they can determine like, of course, like what the project text is, et cetera. From the perspective of the voter, the process is fairly straightforward. So they arrive at the landing page, they have to authenticate in some way that the city determines how. They have then a primary ballot that they fill out. Sometimes they have a secondary ballot, which is for research purposes. If the city allows us to do so, we can ask the same question, the same project, the same everything, but just with a different voting method, which helps us understand better how these different voting methods compare and how people fill out the same ballot with a different question. I'll show you later how we use that information over the years. And then finally, there's a demographic survey where the answers are totally separate from the actual ballot so that the city cannot figure out like who voted exactly for what. So the city, when they want to set up a ballot, they have to make a few of these decisions. They have to figure out in advance, like what languages do we want to provide for how to authenticate, et cetera. We briefly discussed that before. So the platform is really configurable in many different ways. And that allows the city to run it in the way that they see fit in their specific community. I'll go now into a little bit more of the voting methods. So participatory budgeting generalizes to a category of problems that has the natural structure that we can use. We can, for example, define a problem where we have a fixed overall budget that we can spend in a set of proposals, which are predefined, with everything that has a certain cost to it. And every resident will have a certain utility, a certain value for every project, right? Like some people will appreciate the fact that there is a playground that is being improved. And some people say, well, children are out of the house. I don't really have the value for that. So people will have different values for different projects. And the question is now, given all these projects, like which of these projects would you prefer to be selected? And how do we then aggregate those votes together? You could, of course, achieve this in different ways. You could ask people, what are your favorite projects? You could ask people to rank the projects. You could ask people to do something else. There's a few basic methods that are usually being used. And these are at least the methods that are being supported on our platform. So the most straightforward one is approval voting, where we ask people, choose your three, your five favorite projects. We don't ask you to distinguish between them. Just choose your favorite projects. This is called K-approval. Then there is ranking. We could, of course, ask people to rank all the projects. We learned very quickly that that is a lot of work. And it's a lot of mental load for people. And people actually are quite annoyed by it if they have to rank all the projects they really don't care about. So what we actually did is we simplified it a little bit for people. So we first asked them to choose their top K, like for example, their top five projects, just choose the five best projects. And then we asked them to rank those. So that they can distinguish between the projects they actually care about, but they don't have to worry about all the down-ballot projects. Then there is NAPSAC voting. I'll explain a little bit more in what that means, but that basically takes into consideration the cost of every project. And finally, there's K-token voting where you distribute tokens over different projects. So in K-approval voting, where we just ask people, choose K-projects, it's really hard to consider trade-offs between projects. Like you can have this one project that costs like half a million and this other project costs 10,000 and like how do you choose these? So naturally, intuitively, you might maybe think like me that people would naturally choose the more expensive projects because they feel that their influence is bigger if they choose those. It gives you some odd incentives and it's not really strategy-proof. And aggregation, even though it may seem straightforward, it's not always straightforward because how do you treat the fact that if there's, for example, 100 people that prefer a project of half a million and there's 100 people that prefer two projects of 250,000, like which of them would you choose? But the advantage is that it's really easy to implement on paper. So on our platform, this is how we implemented the base, I'm sorry, this is keep slipping. Yeah, so this is how we implemented. So basically people just keep choosing projects until they run out of tokens. Let me just write this way. I think this should work. Yeah, so this is how we implemented it. And then for K approval voting or K ranking voting, we basically have this extra step. So we now want people to also rank our favorite projects. And aggregation is even less straightforward, like how do you treat different rankings with each other. And but it can be a paper alternative for knapsack voting. So in rank voting, they see first the same thing as in K approval. And then in the next step, they were asked to rank the projects compared to each other. Knapsack voting is where people are asked to basically take a shopping cart problem. So you have, for example, if your budget is $1 million, they can choose as many projects as they want until they run out of budget. And that way, there's an automatic consideration of the fact that some projects are more expensive than others. This is what is in the literature is called the knapsack problem, hence the name. And it allows you to simply combine the votes and just add up the projects and that should take care of the utility challenge altogether, which is actually hard to do on paper because you need to give people calculators and the need to type in all these amounts. But you can imagine that on a computer, this actually becomes quite trivial. You can just keep track of how many projects people have chosen, what the budget is so far, and just tell them which projects they can still select and which they can't. Under K token voting, which is something that is still quite experimental on our platform, you allow people, basically you see this in a lot of strategy retreats where people have these little dots that they can put on a whiteboard or they can put on little sticky notes. You can distribute those tokens over different projects and that way you can give different weights to different projects. So you can say, I give like three votes to this project and two votes to that project and five votes to that project. In people's minds, that is a lot more fair because they can really give this weight, but it's not clear how that would strategize into more expensive and less expensive projects. It's also the nice thing about it is that it's a gateway to even more complex implementation such as quadratic voting where your first vote for a project is much cheaper than your second vote, which is really popular in some circles, but that is something for down the line. For aggregation, our platform currently use greedy, which is basically where you just add up all the votes and you just tally them, you just count that's the simplest approach because that is also really the easiest to explain to people. It's what people expect. So you just add the votes up, you rank the projects in order of the number of votes and you choose the highest ranked projects. In theory, there's also some other aggregation methods that are nowadays becoming more and more popular, at least in the CS literature. For example, the method of equal shares, which allows you to be more equitable with your aggregation. So for example, that way you can consider the fact that there might be a minority in your community that has a strong reference for one specific project that is different from all the other people. So you can imagine if 51% of the people agree on a specific budget and 30% really want this other project, like why shouldn't they be able to spend some of the budget on that project if that makes them so much more happy. So I will now transition to discussing some of the research that we have done over the past years based on implementations of the PB platform. First I will discuss how we use anonymized voting data from our platform to better understand how voting methods compare in a practical setting. And then I will discuss the budget feedback exercise in Austin if we have some time for that, to better understand what insights we can get from aggregating this data through clustering. So as you may recall, this was the process from the point of view from a voter where people, they land on the landing page, they authenticate, then they have this primary ballot and sometimes they have the secondary ballot which is now really important for the next part. So they see two different ballots on the same projects, the same election. And we have these three different voting methods which are the most popular in our platform. K-Approval, K-Ranking and NAPSEC. So we put together a data set of anonymized voting data where we have 124 elections, about 125,000 votes. We have about 1500 projects and for each of these combinations and for each vote we have the voter, the project and the allocated budget. We have mostly approval ballots here and also a number of NAPSEC and ranking ballots. If you add those up you'll see that there are more than 124 because we also have for 38 of these elections we also have a secondary ballot available. So that means that the same people voted on two different ballots if they wanted to. And why is this interesting? Because this unique data set really allows us to get a good understanding of the preference distribution per election, but it also helps us really understand what the completion time looks like for different voting methods, how much effort people have to spend to go through it. But it also gives us some vote pairs where people have the same opinion in different elicitation methods. And this allows us to get some correlation of the completion time and the abandonment rate with ballot designs. So if you make a ballot more complex do people actually spend more time? And we can see what the effect of explicit and implicit constraints on your ballot is on how people actually behave. So for this, the building blocks are that for each election we know the median time that people spend on their ballot. We know their abandonment rate, like how many people started the ballot but never completed it, which allows us to look at this correlation between the ballot design and how much time people spend on the ballot. The second is we have these ballot pairs from the same voter in the same election which allows us to compare their primary and their secondary voting method. We have also a lot of long K ranking votes which allows us to deduce if they would have voted with approval voting or with NAPSAC voting, what would their ballot have looked like? So you can imagine if we have a long ranking of like I prefer this project and this one, then this one, then this one, you could basically go down the line and you say like, well, if you only were allowed to choose two projects, which two would that be? Well, that would be the first two projects. But if you would be able to spend a half a million dollars, which projects would you have chosen and you just go down the list, it choose projects until they run out of budget. It's not entirely a foolproof because people may have more complex considerations but if you assume independence, that that should work out. And finally, there's a randomized control trial between different voting methods when two different voting methods are selected for secondary ballot. I won't bore you with all the analyses and all the data but based on these information, there's a few conclusions that we think we control based on this. So first of all, if we infer NAPSAC votes from long K ranking ballots, so if people are asked to rank a lot of different projects, that is a valid approximation. Now, I can see your faces plays over a little bit. This is actually helpful because NAPSAC, as you may remember, is really hard to do on paper. But ranking is something we do and we do know how to do that on paper. So if we ask people, if we want to do NAPSAC and we want to also offer a paper ballot, then we can actually ask people to rank the top five or the top seven or the top eight of projects and then we can do the math for them and we have shown that you can actually use it as a valid approximation. So this is really helpful if we want to offer a paper ballot for NAPSAC voting. This was already claimed previously, but now we show that it's actually possible. The second conclusion that we were able to draw is that voters tend to select more expensive projects if they are asked to just choose three projects than if we give them a NAPSAC constraint. So if we tell them like choose as many projects as you want as long as you stay under a certain limit. So the average cost of projects goes down under NAPSAC. Interestingly, this is not across the board. This is mostly true for the tail of their projects. So this is not true for the most expensive project that they choose. They still choose that most expensive project that half million dollar, but they may prefer a $100,000 project, over $200,000 project. What we also saw, which really surprised us and we'll need more research to really validate and authenticate this, but the data indicates that there is a positive correlation between median time spent by voters and the ballot complexities, which is what we would expect, right? Like if we make a more complex ballot, people spend more time. What we don't see is that if we make the ballots a little bit more complex, then we don't see the abandonment rate go up. So actually people don't say, you know what, this is too complicated for me, nevermind. We see that people, once they commit to participating in a process like this, people are actually really interested to keep going. And of course it's with a lot of, with lots of buts and ifs, we only make it a little bit more complicated. There's only a very small turnout, so maybe these are the most motivated people in the first place, so if you would do this for a full population, it may not work the same way. But this was a really interesting finding. And finally, what we do see is that knapsack voting, for example, does correlate with a higher median time and a higher abandonment rate, but the causality is really not conclusive. So it might be because cities that, for example, are willing to use knapsack are also more inclusive in their recruitment methods. Something like that might be playing under the hood. So it's really unclear what is causing this. So for next steps, there's a whole lot of things to do, but one of the things that is really happening in literature is this greedy aggregation versus method of equal shares. Like how can we make a process like this more inclusive in its outcomes? How can we make this more equitable? And of course we want to collect more data over time so that we can test all these other hypotheses. Let me see, I can talk a little bit more about the budgeting process we did in Austin or I can take some questions. What would you prefer? I can spend as much or as little time as we want. You want some questions? Oh yeah, I will, okay. I'll just try to do it briefly then, yeah? How do you propose we approach our local city government and convince them to try this out as an experiment or to support this? What should we do, basically? Yeah, that is a question that is a little beyond the scope of my, because I usually only get involved when they're already convinced, which is really the best position to be in. But my suggestion, I mean, my opinion is as good as anyone else in this room on that matter is what I'm trying to say. Talk to me about it afterwards. But what I would suggest is ask that question before they get elected. That is probably a good start, but also let a lot of people ask the same question and try to show them some examples of what works. Yeah, great presentation. So I'm curious, a lot of this, especially the voting mechanisms that you mentioned, have to do with ranking things in relation to money, which is valid and makes sense from a budget standpoint, which is the topic of this conversation. I'm curious though, are you all looking at or do you have thoughts whether this sort of platform can apply to even more expensive thinking? Like for example, so let me give an example. I'm thinking about how certain, like for example, COVID, right? You know, the response to COVID, the money kind of appeared very quickly or if you follow the federal government, money seems to appear for certain key objectives. And I guess I'm just wondering whether this approach can be applied earlier to say like what are the important considerations that we should think about before thinking about money? Is that even valid to think about and how would we engage citizens in that process? Like, does that question make sense? Yeah, it makes a lot of sense. I mean, I spent, yeah, there's a lot of, it really depends, it's kind of a cop-out, it's kind of like a fake answer, but it really does depend on the question. It really depends on what you're trying to figure out and how complicated it is, for example. So sometimes you can just do a survey, right? If everybody knows what you're talking about, if everybody understands the problem to the root, then you can just ask a survey. You can just put a survey out and just let people respond and that would be, and if you can get a representative sample, that's a good way to get people's opinion. But there's a lot of topics where that's not obvious. And at least the budget component gives people a common reference framework. I mean, it gives people a common understanding of like what does a lot mean? Like, well, we're talking about dollars here and that's something that people can at least, they can put some value in. Of course, a dollar is different for one person or for another, but at the same time, we're in the end, we're talking about our community dollars, which is much harder for these qualitative questions. So that is why they're a little bit less straightforward and they're definitely a little bit harder. But one of the ways that our group is working on that is through deliberative polling, which is where you basically try to figure out what, like what people would say about a topic if they were actually informed about it. And that is by doing a survey before you have a deliberation and you let them deliberate about the topic in a structured way, which has a lot of buts and ifs and a lot of constraints. And you pull them again afterwards. And you look at the shifts in these pollings. Of course, simplified, but that is basically what you're trying to do. And that allows you to get a little bit more insight in all those complications. Just gonna ask, I was also curious along that same line of thinking, is a platform or approach like this has it been applied to software like open source projects? In terms of, because the currency is different. It might not be, maybe it is money, but maybe it's not only money, it's like effort, time required and importance and things like that. Which again, like polling is one way to do it. I'm curious though, how could this be applied to how open source projects manage or decide what to do next? That is a great question. I think, I'm trying to think through this because if you want to basically get an equivalent for dollars in an open source project, I think that is where the tricky part comes in. If you wanna use these approaches of participatory budgeting one-on-one. Because even though you could say time has a similar scarcity, but it's less well-defined. You can measure it in hours, but at the same time, in an open source project, theoretically the amount of time, the amount of developer time available is unlimited. Theoretically, in practice it doesn't work that way. And I think that makes it so much more complicated. So prioritization is probably something, but then at the same time, prioritizing one direction over another is really hard because it requires you also to make an estimate of like how much time something will cost. But to some extent you could use similar approaches as NAPSAC voting, for example, in time management, but it will require to make a lot of assumptions. There are many projects for voting in open source software scheduling. Josh Berkus has written one and you can find him at KubeCon, I think. I became interested in this after reading James Fishkin's book, Democracy When People Are Thinking. And in that book, there's a tremendous amount of emphasis on how the deliberation happens before the voting happens. How does your, the mechanism you're advocating here support the actual deliberation process? It does not. That's short. Yeah, no, it's like, so the platform that we have, PB Stanford, is only for the voting phase. We're not claiming to do anything else. In participatory budgeting in general, like the whole process from start to finish, that has a lot of deliberation in it. And so that is where you bring people together in this town hall where people come up with proposals. They have to work together for that. And they work together to improve the project quality. But that all happens before they get to the ballot. Yeah, so in participatory budgeting in most cities, again, there's a lot of different definitions out there. So there might be cities out there that don't do this. But in general, like participatory budgeting will have often deliberative phases built into it because they really find it important that people connect with each other. They build coalitions. They try to figure out how you can engage more people in it. And I think that is also where a lot of the benefit comes from, a lot of the trust-building. It's not necessarily from the voting itself. But then after that, it's also really important that it gets executed, which is also not always that easy. I mean, that it gets evaluated and reported back. So it's really a big pipeline that really needs to be executed from the finish. And we only provide a very small part of it. But typically a part that cities find challenging to do it themselves. It's maybe an out-there question and somewhat related to this one. But is any of this sort of effort towards producing these different voting methodologies, are they also being measured simultaneously in terms of the beneficiary spread of success in an outcome? So you mean who benefits from the outcome? Exactly. So I guess I just put that in there and I'm sure it's already in your mind. And that is there's always, I mean, it depends on where, but generally it's always those with more leisure time, those with more money, those with other sort of features that motivate them more or enable them more. Yeah, so that really depends on how the city organizes it. So usually I'm not involved in the recruitment of the participants. But it's really, what I've seen in, like I've seen this from the sidelines a little bit. And I do see that in some cities, you see that there's been a lot of effort being made into recruiting people that are not your traditional voters in elections. And you can see that in language statistics, for example, you can see that in like what kind of devices people are using sometimes. But especially the languages, you can see that some cities make a lot of effort into making ballots available in a lot of different languages. And in some cities, you see that also those languages actually get used. And that is when you know that these cities actually make an effort to go out there. I've heard stories from different, from some cities that said like, well, the only way that we can actually get to those people that typically don't have the time to come to these town hall meetings, at least they try to go to the bus stops and catch them and they're waiting for the bus anyway. Just have you, do you have five minutes to fill out this ballot with an iPad or with something? It's not perfect, but at least you get them engaged in one part of the process. To get them engaged is something that will take them several hours, that is much harder. Yeah, sorry. Do you find, since there's money on the line, in every situation I've ever worked in where money is on the line, there are always people trying to gain the system to get more money for themselves. So what type of security do you use to deal with adversarial attacks or somebody trying to shift the budget for their own favor? It is surprising, but I have not seen a lot of that adversarial behavior in this process. And that is probably part of that is because it's relatively small amounts of money. And the projects on the ballot are predefined. So at least when it gets to the voting stage, all the projects are at least beneficial to the community. They're not very personal, right? It's not like I get $100,000, no, it's like the $100,000 goes to improvement of the playground. So that is my personal guess, but this is going a little bit into the speculation direction. What we do see is that the cities have to always make a trade-off between security, like how do you keep people from voting multiple times? And on the other end, how do we make sure, and that is where your question comes in, how do we make sure that people that are not your typical voters actually get to participate, that they don't give up before they even see the ballot because they have to give out too much personal information and they get scared because like ICE might be on the doorstep next week because they give out all this information to the government or they, like how do you get teenagers involved? There's a lot of cities where they lower the minimum age to like 12, 13 years old. So there's always this tricky balance to strike and a lot of cities end up with basically one phone, one phone kind of approaches where you use text messaging. So you enter your phone number and you get a text message with the code and you can log into the system with that code. It's fairly straightforward. It's probably not entirely foolproof, but it is at least something. What there's one system interestingly, so that's the part I skipped over in Austin. We did a budget feedback survey in 2020 and 2020 was also the year where George Floyd was murdered and it happened in the middle of that budget feedback process. And if we look at that, if we look at the number of votes that happened over there, just as a warning, this is a logarithmic skill. So you see that there's a huge bump in number of participants. And initially our first thoughts were, well, there must be something malicious happening here. And we dive deep into the data and try to figure out like what is going on here. In the end it turned out, well, it's actually, it's because George Floyd was murdered. And part of the questions on the ballot or on the survey was how much money should go to the police department? And people really cared about that at that point. So we saw a large shift in turnout but we saw also a large shift in opinions. And that is, I think that is one of the lessons also, like even if it looks suspicious, but even if you look at who voted, we saw that 80, 90% at least was coming from, was directly traceable to the region of Austin. So at least people that were somewhat local. It's not like they recruited people from all over the country. And this is with huge peaks. So I've never seen anything even close to this. And I've seen a few elections, at least on PB now. So it's usually, I mean, yeah, I'd knock on wood, but I'm, yeah, I don't have signs that that malicious behavior plays a large role. I think we have time for one more. I'm sorry, I had to step out for a minute, but so you guys work specifically more with kind of larger budgets, city type size budgets, right? Have you guys worked with any organizations to encourage either the use of your design or similar designs into maybe a smaller scale, like tenant unions or mutual aid organizations to kind of encourage a smaller grassroots level participation in this type of budgeting voting system? So while the fleshy examples might be big cities, that's not where the bulk of our ballots actually are. So a lot of the ballots are either very small cities, small communities, sometimes a few hundred people. Sometimes there are NGOs that are just really caring about a specific community of people. Sometimes there are NGOs that are more like an association style. So you can definitely do the same style of thing with participatory budgeting. It's just that we see it a lot happen because most of the cost is not in providing the budget for the, it's only part of the cost in providing the budget to be spent. But a lot of the budget, a lot of the effort also goes into recruiting people to actually participate. And that is usually the hardest part to actually make happen. All right, other questions. Looks like everybody wanted to ask questions before we got to your Austin example. Well, thank you very much, it was very important. Thank you. And if there's more information on the slide, and feel free to shoot me an email or catch me afterwards if you're interested in more. And do I just like a free poster on my head? No, on your ear. Oh, on my ear. Look at it on your ear. Like that? Okay. And you're gonna pull it away and bend it a little. No, not twist, bend. Like that? No, no. Give it a little curve. So it's pointing towards you, there. All right, let's try the ear, there. And now kinda, that should be good. Yeah, okay, awesome. Hi, hello. Okay, I think that was a good warm up. Did some look comedy, a little sketch. And all of that was planned. I hope you guys know that. All right, well, let's get into it. So I spoke at scale last year. I hope, is there anyone who went to my talk last year about developers getting involved in public policy? Okay, awesome. Because I reused some of the content from that one, so I was hoping I wouldn't be found out. So yeah, here I will put this into the slide. All right, hi. So I'm Margaret Tucker. I am a member of GitHub's policy team. And you might be asking, oh, why does GitHub have a public policy team? And one of the reasons why we have that way, yeah. One of the reasons we have a public policy team is because developers are really important policy stakeholders. And their interests aren't typically represented. Oftentimes open source is misunderstood, scapegoated, or just laws are written in ways that can break open source. And obviously, many downstream dependencies of open source collaboration, making sure that people are able to collaborate and connect across different borders, different countries, all of those things is very important. So with this talk today, I'm gonna talk a little bit about how we do policy advocacy. But the main thing I'll be doing is demoing our innovation graph, which is this website. It's really cool and we'll get into it. That'll be the most of the talk. And we made the innovation graph as a tool to be able to demonstrate the value of developers and code collaboration to policy makers and journalists. And you guys can use it too if you want. And then I'll get into just some other options for how developers can get involved in policy. So, that we've seen all of, I would say that this year, 2024 is a really big year when it comes to tech because there's quite a few global elections. There's a lot of fervor around AI, a lot of fervor around cybersecurity, and also online safety. And open source intersects all of that. So we've seen a couple of different interesting developments. So the EU AI Act was recently passed. That was the first major AI legislation that we've seen. And it does relate to open source in some ways. And we did do a lot of work to try to keep open source out of it and focus on large models. But that is a big one. The US Executive Order and AI came out last year that also has kind of put for some things. It's also called for some interesting things like data resources for openly available model weights. On the cybersecurity end, the EU Cyber Resilience Act, that was a big one. So we did a lot of advocacy efforts to try to keep open source more out of it. Are any of you familiar with the CRA? Okay, all right. Well, I'll just give a brief on this one because it's gotten a lot of flak. So the EU Cyber Resilience Act was essentially kind of designed as like a consumer protection bill for products on the digital market. And so basically saying if you're selling software that you should have certain cybersecurity things in check. But there was some kind of concern. And I would just say that it was challenging to get the wording right, to keep open source out of that. So obviously, open source can be included in products that are sold, but if you're giving your software away for free, then you're not necessarily a vendor. So that was another one. We were kind of explaining why they open source should be kind of kept out of these things or it should be considered in a way that doesn't break it. Oh, and then the US National Cyber Security Strategy is another one. I would say with the cybersecurity strategy, they did recognize the importance of getting the balance right of who has the responsibility when it comes to securing the software supply chain. So that was pretty cool. And then on the online safety side, as I said, so this election, and a lot of people are saying this, we'll see how much it turns out to be true, but I think the specter of the 2016 election and how Facebook played a role in that has given a lot of focus on deep fakes and also non-consensual intimate imagery. There's been some really high profile cases. You might have heard there's a Taylor Swift one. There was a massive New Hampshire voter fraud scam using voice cloning. So kind of this not knowing what's real and all of that has come up a lot. And then with that, there is more scrutiny on open source AI. And so that does come up on GitHub because we are the largest code repository and some of that includes AI models. Yeah, and then there's also just quite a few different online safety acts. And so anyway, a lot of policymakers are interested in tech. But as you saw with the challenges of this room being the government room and not necessarily having the best tech, a lot of policymakers don't understand open source. They also don't understand the value of open source. And so when explaining, hey, don't break this, this is important, this software is used in so many important things. You use it yourself as the government. It's hard to share the global level of open source. And so I guess that's kind of where the innovation graph comes in because we're GitHub and so we have a lot of data. So obviously the dependency graph and the Octoverse was the start of this. But one of the things that we've been moving towards as a policy team is supporting more research and generating more data to show just what's happening on the whole open source collaboration on GitHub. And so with that, we started off with providing bespoke data to select researchers who were interested in seeing top languages, collaboration across economies, things like that. And as we were doing that more and more, we kind of built out this process and built out the data so we could share it publicly with privacy concerns and all of those things in chat. So yeah, anyway, I will get into the innovation graph and we're just gonna go on the website because the screen sharing worked out. But and if you wanna go on it yourself, you're welcome to the link is there. But basically we had a lot of information about how people are interacting and working publicly on GitHub. So this is all public, it's not enterprise. But things like Git pushes, repositories, what economies are collaborating with each other, licenses, that one's a really cool one, different topics used by developers. And so we built out this innovation graph website and I already have it pulled out. So yeah, okay, so this is the GitHub innovation graph. We launched this I think in September of 2023, so it's pretty new. I am particularly fond of this little graphic thing here because you also can play with it, which is fun. And so we can kind of dive into it. As I showed you guys, these are the different metrics that you can use and then within those metrics you can go on the data. Any of these sound interesting, call one out. But what do you wanna see? Pushes, repositories, developers, economies, okay. So oh, the economy one is cool. Actually, maybe we'll do, yeah, economies. So economies, basically, we call it economies because there's some different names for things. And we use the NAICS codes for country codes. But basically every country, the people who are using, including Antarctica is right here. We can see develop pushes on Antarctica. How many, how many we got? Oh yeah, not bad, 146 repositories. Let's see, what else do we have? We don't have too much data on Antarctica. It's probably not a good example. You know what? I'll just get into one of the more interesting examples we have. This is one of my favorites. And we actually posted a blog about this. So if you wanna read about this in more detail, check that out. But one thing that we've been curious about and are kind of like, now that we have all this data out and people can download it, it's in a repository, like all the files are there, we've been kind of curious about how global events have impacted collaboration on GitHub. And so obviously things like war, internet blackouts, you know, like major global events, like how does that impact collaboration? So this one's a kind of an interesting one. So we're on Ukraine. Git pushes, repositories, trying to go down to the inbound collaboration metric. Licenses to fix. Ah, yes, okay. So this is inbound collaborators over time. So this is people who are from another country and obviously VPNs kind of make this tricky, but we have a method to hopefully, like that they're accurate. There's a pretty sharp change right here. What do you think that is? In Q1 of 2022. Anyone know what happened then? Oh yeah, there was a war. So the war started. And so this is actually pretty cool because it shows that there was kind of massive, like an influx of collaboration coming into Ukraine. And I think a lot of that, you know, it did like spike and it did cool down a bit, but it really stayed high consistently. And so yeah, that was pretty cool. Also huge shout out to Poland because they've done a big chunk of that. So that was a pretty cool one. And I think that we're gonna try to continue doing these like sort of insights of how things are happening on GitHub. But this one was really cool. So one other thing that I will show you guys on the languages side, it takes, I will say that the site is a bit slow. Yeah, all of this is public. And then all the data is, like we have all structured data files available for download. So you can go, this, our repository here can show you guys. This is our repo for it. We have an engineer on the policy team and he's really cool. This is like kind of his baby. And I worked a lot on the website development of it. So yeah, so all the data's here. Do whatever you want with it. We also encourage people who do use their data for research to let us know, send us the, you know, whatever you publish because we also have kind of a, you know, like a hall of fame sharing out what people have found. So this one, it's a little bit hard to see up here. But this is the top languages globally. And some interesting things happened within the past quarter. One of them is, where is this? I'm looking for MIT zero. The Rust story is crazy. That's pretty sick, right? Also you can see JavaScript over, or TypeScript overtaking. So TypeScript moved up to number four on GitHub. I'm looking for Muwan though because that one just appeared within the past. I'm gonna make sure I'm talking about the right thing. Q3 2023. I hope I'm not promising something that's not correct. Huh. Wait, oh, I might be on Ukraine total. I'll just go in there. Let's go back to languages, economies. If this doesn't work, then I'll just move on. But I thought this was correct. Huh. Well, we did find this, and I don't know if it's on a different part of the site, but essentially Muwan premiered in the top languages globally in the past quarter, which is pretty cool. And also the MIT zero license appeared in the top. And I'm not seeing that here, but you gotta believe it is here. I think I just might have the wrong parameter setup on this one. But anyway, so there's some really interesting things. I think the languages one is pretty cool. We played a lot with different visualizations of the data, but the thing we're really trying to push with here is making it as self-service as possible. So journalists or maybe a staffer for a congressional team, anything like that that people can just go and find out what they wanna find out. You can also cross compare, and that's pretty cool to see how countries have moved through time. The topics one is pretty cool as well. I'll show you that one. Let's see. It's like the topics that are used on GitHub. So there's a topics selection on GitHub. And this is a thing too, is we're trying to find out more of what people are doing on the platform, but it's kind of hard because it's like labeling is used inconsistently, like finding out what people are doing on related to certain topics or certain fields. Like it's just finding that is not as intuitive as we would like. And I really do love these. Things are very interesting. It's like, oh yeah, Rust, Next.js. That was a pretty meteoric rise. AWS, yeah. So anyway, it's pretty cool. And I'll point you to the Insight reports as well. So this is where we share different research things. So includes both stuff like on GitHub itself, like the Octaverse, but then also different AI indexes and different things on the benefits of open source. So yeah, I think the thing we're trying to get at with that is kind of moving towards an understanding and an assumption that open source is valuable and then actually having the numbers to back it up. We're hoping to get more down to the state and maybe even local level. And so that's something we're pursuing. And we do have a repository where you can open issues. People have opened issues to tell us like issues with the website itself and also share ideas and share their own research. So please like check it out. We're pretty passionate about it. And yeah, I think it's something that it's obviously, it's public activity on GitHub, but it also does speak to just how collaboration is working broadly. All right, so now that we've seen our lovely innovation graph, here's a question for you. How can developers get involved in public policy? Well, you can go to the GitHub innovation graph and find a bunch of information and share that with policymakers or write an article, but you also can work with the public sector. So there's a lot of opportunities to directly work with the public sector. And I would say that an understanding of technology and especially open source is in hot demand. So for example, Audrey Tang is an open source developer and Taiwan's minister of digital affairs. There are quite a few opportunities such as like the tech Congress and US Digital Service where they specifically look for people with that kind of experience. But yeah, it's something that we really encourage. And ultimately, like in Europe, they fully have a pirate party. There's developers who are in parliament and we wanna see the same thing in the US. We wanna see developers not just being able to have their voices heard, but being in positions of power. And honestly, this thing, it's only gonna become more significant. So just never think that you're not a stakeholder in public policies, you definitely are. And with so many things being related to tech, it's really important to make sure that they don't break open source or supportive of development. And honestly, I will say this, we're definitely in a time of increasing nationalism. So being able to speak to the value of international code collaboration is really important as well. Also, so one thing that we're really passionate about at GitHub is directly explaining how policy developments would impact you. So there's a couple opportunities. These ones are active right now. So if you're interested and wanna look into it. So currently the DMCA Triennial Review is allowing replies. There were some really interesting submissions here. So there was one sponsored by the Authors Alliance and Library Copyright Alliance on AI and TDM for literary works and motion pictures. And so they're looking to expand that exemption. The security research exemption was just renewed. So we've done a lot of activism on that in the past, but that one is just, it was like basically renewed with no opposition, which is great. So yeah, so definitely encourage you to check out the DMCA. That's a big thing at GitHub. Obviously we really emphasize the importance of dual use on our platform and understanding that things that, a project that may be used for security research has value and should be on the platform even if it can also be used for malware. So we kinda try to distinguish with those things. And then there's another one. I think the request for input ends on the 27th of March, but the NTIA is having a dual use foundation artificial intelligence models with widely available model weights. There's a lot of discussion about openly available, both model weights, training data for AI, and it's a really interesting topic of conversation. I think that open source AI is, it's kind of a source of fear for policy makers, but so it's really important to articulate the importance of it and also why, having open source versions of these massive models or approaching things in different ways would be beneficial to society as we navigate this brave new world. All right, and then also, I think you're all doing this already by being at a scale conference, but we really encourage people to participate in ecosystem stewardship. So if you have opinions about how open source policy should be and what we should be advocating for, there are these groups already in existence, like the Linux Foundation within that open SSF is a big part of our advocacy on the security side, Creative Commons, Wikimedia, Open Forum Europe, the Open Source Initiative, OSI. So there's a lot of ways that you can get involved directly where there are coalitions already. So if you're like, oh, how do I get involved? There are ways if you have opinions about policy, if you want to contribute. And depending on what organization you're working at or what projects you're working on, that might already be happening at your company or your school. So with that, we also encourage people to be policy champions at their own organizations. So I don't know where you all work, whatever, but you can share policy priorities and feedback, even if you're not in legal or in policy, like everyone's a stakeholder. And I'll just say, I've done quite a few submissions to request for proposals on really complex like security policy topics, and I really am sitting down with a lot of engineers hearing what they're saying and then kind of translating that into policy speak to kind of advocate for what we want. Also, encourage open source to your organization and offer your skills for public good. There's a lot of opportunities to do that. We do have a social impact website, so there's some things that are specifically for skills-based volunteering with developers. So yeah, I guess this is kind of a speedy presentation. There's some options for how to get in touch with us. So obviously, if you want to comment on the innovation graph, you can comment on the repo there. We also have a developer policy repo that's more general. We have a Twitter and we have, ooh, we have a blog. And I will say, I run the Twitter, so if you want to at us, we read everything for better or worse, so it's out there. Oh, I guess it's not Twitter anymore, it's X. Ooh, can move on. And then also just share how policy impacts you, advance causes that matter to you. Volunteer, encourage others to join you. I think that open source is a really cool thing, and I think that it's definitely a value to any organization, and there's so many different ways to contribute. Yeah, so I'll open this up to Q&A. I did kind of speed through that. So if you have questions about other policy developments or the innovation graph or anything about how we do policy advocacy at GitHub, I'm all yours. And thank you all for sitting down for this. Thanks for your talk. I work in climate change, so policy is also very important for us, but there's a very often a misalignment in incentives, like as scientists we say, we should be doing this, and policy makers are like, this costs money, we don't want to do it. So is there something that we can learn from you to be more effective in advocating for what we should be doing, and maybe trying to align incentives between what scientists say and what policy makers want to accomplish? That's a great question. Well, I'll just say, if you're working in climate change, share your data on GitHub. But I definitely think on the policy advocacy side, there's a couple of different approaches that we've seen work, and one of them obviously is make that constituent case. And that's what we're kind of trying to get at with the innovation graph is being able to get down to what's happening within your own country, and hopefully we can get down to counties and states and other kind of geographies. But I think like making that case, on like the side of open source, the economic argument is a very strong one, and so explaining the value to the economy, and then frankly, if you're within the US or other places, but especially the US making a national security argument is another very effective thing. So explaining, I think there's a lot of concern about open source, but it's also important for there's a lot of benefit to global collaboration that actually is counter authoritarian, and so it's something that can be supported. I didn't include that in this presentation. It was something that happened last year, but I don't know if any of you are familiar with OFAC licenses. So anyway, OFAC is the Office of Foreign Assets and Control, and so they are like the ones that work on sanctions. And so obviously there's different countries where you're not able to operate business, and so we pursued, there's been a couple, the main one was Iran. We pursued avenues to have limited licenses or allow people to use GitHub. So that was a big one, and it was actually kind of interesting because last year, I guess, I think it was in like this September or so, they extended those licenses. The things that we had worked out for years for other companies, and so it was something that GitHub had done all this effort to make our services available, and then the United States recognized that having these platforms available within Iran is beneficial both to their security interests, but also to just pro-democracy things in general. So I think that those kinds of arguments of thinking about what their goals are and how whatever you're doing fits within those is a big one, but honestly, I commend you on anything related to climate change. I did actually, I got into this type of work because I studied geography, and I was really into open source mapping, and it was a lot of stuff on climate change and sea level rise, and I got out of it because it was really depressing, and so I definitely commend your work. Do you have any other questions? I mean, I can just say everything we do is on GitHub, GitHub Power is our, everything that we do, so thanks for that. I think one of our challenges is, if we want to see the economic benefit of our working climate change, we have to wait like 50 years, but 50 years is not the time scale of politician is typically interested in thinking of. So there is a trillions of dollars in climate change, but there are like far in the future, so it's a little hard for us to make compelling economic arguments and national security, and I'm totally sure what we can say about that. Well, I'll tell you, one project I worked at was on sea level rise and defense insulation, so that's a big one, so it's just like, things will be, there's a lot of money placed on our coasts in vulnerable areas, so I think there's some arguments. Yeah, are there any other questions, comments? There's someone behind you. Do you have maybe a sort of quick list for how you would recommend people talk to policy makers to explain how something like open source is important, sort of a hit list for, so you're going to talk to a policy maker, here's what you need to have in your presentation or your document for giving to them. Yeah, no, that's a great question. Like obviously there are venues, like on the federal side where you, they have calls for proposals that anyone can submit, and so there are kind of developer groups like DEF CON and other ones that have done these submissions, and again, like OpenSSF is a big one with that, so there are those kind of calls for proposals with directly interacting with policy makers. Obviously the way we do it is when we're kind of negotiating on legislation, but I imagine that for developers specifically trying to advocate, you have to have an ask or something that you're going for. I definitely, obviously there's things on supporting OSPOS, supporting just in general, education, STEM education, things like that that I could see being like a direct ask, but I would say, you know, like trying to make the case for, you know, for example, the EU copyright directive, or any of you familiar with that one? That one was a big one. That was in 2019, it was before I joined the team, but I've heard tale of it, and so the copyright directive originally had a proposal, and we got it out of there, but a proposal to have software scanned, so with scanning, you know, using technical measures to detect copyright infringement, there was the proposal of like, oh, well, software can be copyrighted, why don't we do that? But obviously that doesn't make sense because there's a lot of independent duplication with software, there's only so many ways to get to a certain answer. There's a lot of, you know, conflicts with open source and how, you know, software is developed, and so that was a big one, and we actually did like mobilize developers and asked them to get in touch with us, and then we kind of like routed that to policymakers. So yeah, I think there's a lot of different venues, and I don't wanna say, oh, don't directly at your congressperson or whatever, but I do think that working through these venues of comment or working through, you know, different like civil society organizations that have developer interests, those are like, you know, those are people who have that, like that's their job 24-7, and so I would definitely recommend that, like for example, almost every CISA, which is the, oh my gosh, Center? No, it's Information Security Administration. Oh, I don't know what the C is, but anyway, the main like in a software security organization within the US, they do request for proposals all the time, and you know who's always submitting them, OpenSSF, and you know, if you wanna get involved like that is, that is a venue, yeah, does that answer your question? That's a great answer, thank you. Okay, thank you, yeah, and again, like I would say on like the innovation graph side, if you wanna say, hey, why don't we not block, you know, for example, like there are some considerations of blocking international collaboration, like nationalism, oh, we're in competition with China, for example, so we don't want them to collaborate, and you can make the argument for why it's important for Chinese developers and American developers to be able to collaborate. So anyway, I guess I use the innovation graph and other venues for sure. Any other questions? Yay. How big is your team? That was a question. Oh man, so our team is pretty small. Don't, I'm like, I don't know if I can share this, but we're actually getting a new member on Tuesday for next week. Are any of you familiar with Felix Reda? Felix Reda, okay, well Felix Reda is a former MEP from the parliament, and so he was a member of the pirate party, was really involved in the copyright directive, so I'm very excited, that's actually, I'm like, don't tweet about it yet because we're gonna be announcing it on Tuesday, so we have Mike Lynxfair. Mike comes over from Creative Commons, and he's a really cool guy, I would definitely recommend just looking into, he does a lot of thought leadership in general, and then we have a staff engineer who was a big part of building the website, and he does a lot of things, not to take more time, but are any of you familiar with YouTube DL? YouTube DL? Okay, hey, all right, so you, definitely I would encourage you to look into it, so YouTube DL was kind of a big hullabaloo because we took down a popular project based off of a DMCA request, and it was taken down a bit hastily, and after we reviewed it we realized that there were ways that we could've kept it up that wouldn't have been violation of DMCA, and so with that we changed our processes, and we brought on a staff engineer, and so now all of our DMCA requests are reviewed both by lawyers and engineers, so yeah, we changed our process with that. DMCA is tricky because there are, again when we get into dual use, YouTube DL was interesting too because it was like the YouTube rolling cipher, and sharing a method for downloading YouTube videos, like that's not an extremely controversial thing, and one of the main points of contention within YouTube DL was that there was copyrighted content in the examples they had posted on the repository, and so kind of working with hey, this is how we can keep this up is something that we do try to do with people, and then the same goes for malware, like we have a very supportive approach to dual use when it comes to security research, but we disallow malware on our platform that's actually being used for active attack, like there's a difference between sharing this so security researchers can study it and then just someone who's perpetrating an attack, so there's a lot of, I would say, gray areas. Anyway, I'm sorry, that was a bit of a tangent, but it's interesting. All right, I will, yeah, I'll say thank you, and yeah, please get in touch with our policy team. Oh yeah, so Mike, myself, incoming Felix, engineer, and then one other policy manager, so we're pretty small, honestly. So I was wondering how close are you guys with Apache? Um, I think that, I would say Mike, because he comes from a very, he's a developer, and he comes from that perspective, he does do more of the languages side, but most of our interaction with the broader developer ecosystem is through the Linux Foundation, OSI, and things like that kind of feed in. We also work with Red Hat, they have a policy team. I haven't interacted with someone, I don't know if Apache, if they have a policy team, and some of these, when it comes to package repositories, some of them are very minimally maintained, and so that is something as well as keeping things up. Anyway, all right, no more questions? Last chance? All right, thank you all. Laza. I'm just gonna help guide, keep our people. Okay, do you know the deal, I guess? Yeah, it goes over here. This is all for questions. I just put it on. Okay, thank you. I was looking to see if there's a separate mute on here, but you've got, you've got a microphone to pass. Can you all know how to speak into a microphone, not like an ice cream cone, right? Yeah, do we have an extra hand mic? That's right here? There's one, yes. You can use this, and as I say, this is typically just used for questions, and that's what I'm a lot of people share around. Yeah, you know, we can, oh, this is really nice. Hi, everybody. I'm talking right now, is it very loud? It's not very loud? I'm very, I whisper a lot. Yeah, this can, we can always pass this microphone. What about that one? So that's what we'll do if we've got, you know, the questions on it. Okay. Standing? I'm just gonna just put the link up to the sidebar, I guess. Barely possible. Mm-hmm. When they've got the big, splitting over there for the previews. Oh, for the previews, right? Yeah, that's what I'm talking about. Are we, I think we're on the timeframe too. But I don't know about that. I'm just gonna put it, I'm just putting it slightly with the link to that. Shit. It went downhill real quick. Yeah, it went downhill real quick. You want to go over the hand, yeah, I agree. You guys were preceded by Github's Okay. Well, yeah, we didn't know to bring them, bring them together. But yeah, that was, you know, it was always looking for people. Let me shift this around so it's really not there. I'm sure I want to pass it on. Okay. Well, good. Let's get started. Okay. Absolutely. Anything special? Five, four, three, two, clap. Okay. Let's go slip this around, but I'll be able to take it off if we need to. But thanks. I guess I'm going to be our moderator, but I'll do any, I'll say any additional parts that if I have any value to add to the conversation between the two of you. But thank you for coming everybody and audience at home. This is the, this is the panel that we've, the others on the perspective, three different perspectives or points of view on the open source program offices from our different levels of experience and background in doing this. So I'm Karsten Wade, the, I'm an open community architect founder and essentially doing community management and full life cycle for open source projects as a consultancy. I'm Stephanie Leege. I am the head of the, or the principal investigator for a project that is creating a open source program office in the University of California, in Santa Cruz, as well as the lead on a project that is putting, trying to get a OSPO, the OSPO approach throughout the UC system. So won't be just the one campus, but we're hoping in the end to actually have all 10 campuses following kind of an OSPO, we call it the OSPO approach right now, but basically have some level of an OSPO within each campus. I'm also the executive director of the Center for Research in open source software, which is also at UC Santa Cruz. Awesome. Yes, hello, thank you all so much for having me today. This is super exciting. I'm Brittany Isennis. I am an open source strategist for this company called Fannie Mae. You know, so what that kind of means is there's many different facets of open source, open source program offices and all of these gray areas in between that I actually do put a lot of attention on as much as I can being completely distributed amongst all of these different initiatives, but it's pretty interesting. And yeah, I really do, I enjoy it quite a bit. And I'm also co-chair for two special interest groups at Venus, which is the financial open source foundation. So I'm co-chair for the open source readiness group as well as the inner source special interest group and also a to-do group steering committee member. Where do I have time? I just thought about that. But yeah, so that's a little bit about me. Okay, good. And as a panel leader, I'm gonna take the opportunity to say no more about myself, so that's perfect. Let's go on. Oh, wait, these slides are in the wrong order, aren't they? How does, I thought that was the, sorry, how do we, de-de-de-de-de, are we, oh, we know we didn't, I don't think so. Did I hide something? Yeah, it was, I thought we had another slide or something that was, oh good, we're just gonna leave this up while we're talking, that's right. So now we've just got our napkin to, thank you, so we have our napkin notes to shift over. Yeah, I just thought I'd join you down here because there's no reason to be standing up there. Yeah, here, grab the mic if you're gonna talk. So, let's see, to start off, what we wanted to do was first give a quick definition from our perspective of what an open source program office is and what it means to be an auspice, so. Why don't you take that one? Uh-oh, really, you sure? So my perspective of an open source program office is very nuanced and that's necessarily like the traditional maybe way that you would view what an open source program office does. I might as well just talk more to them, right? Right, because I'm just gonna tell you that no two were made to like, et cetera. So you might as well tell us about yours. So I think from an open source program office, they do exist within particular different enterprises, foundations, academia, all of that. And essentially what I look at is an open source program office has subject matter experts, thank you, subject matter experts of particular fields that focus on all of the different facets of open source. So we have open source compliance, we have open source contributions, being a good corporate citizen, trying to actually work with these communities and also staying secure within the software that we develop while enhancing the developer experience. So that's just one of the ways that I look at an open source program office and I think that they are mission critical to many enterprises. I think, so I come from the academic side and we're very new to the whole idea of open source program offices. I mean the academics, we're really just getting the first one was really 2020 when the Sloan Foundation started giving us, being very great and giving us funding to start doing this. But really a lot of the discussion started when looking at OSPOS outside of industry. And so OSPOS++, which some people may have heard of, kind of started that discussion with saying, okay, this is an interesting perspective of how do you kind of become the center of excellence or the center of gravity is how we like to put it up for open source within a public, so non-industry institution. And so we became, so that whole non-industry group kind of came a little bit like, oh, what's unique about how we do it versus how it works in the different industries. But also we started to recognize even within that group, there were subgroups, because academia was looking at it at different things, the government and government looking at it different than say, like NGOs and international government organizations. So I think that that was, so for me, when I look at what an OSPOS is and from my perspective, it really is talking about, from a university's perspective is how we use open source to kind of leverage and help leverage the impact of the research that we are doing, which is kind of the important aspect of I think what the university OSPOS are particularly owing. But at the same time, again, no one size fits all. If you've seen one OSPO, you've met one OSPO. Or like, what if you've met one OSPO, you've met one OSPO type of thing? Because even within our group, which we're all like a very tight knit group right now, because there's only like 12 of us, we are all different. And so it's really cool to like see the different kind of flavors that each group is kind of doing. So but within the UC system, a lot of it does definitely have to do with leveraging research and trying to create better, transferring that knowledge to the wider population. I got my mic, thank you, it's good, okay. Yeah, and I think from a very, because my definition is always gonna be very dry, right? But because it's a program office that is in any organization or institution is a construct that's made to sort of like a center of excellence or a place that everybody knows that you can go to to get certain things done and will help sort of drive things forward around a technology or a concept or whatever something is, right? And so some big organizations have had program offices in those concepts for a long time, but it hasn't reached across. And so but sometimes what happens is when you just get the right, it's not even the right words, but just a label for something that needed to be done. And I think what's been happening is that there's been different groups for a long time that have been happening within organizations that have been doing sort of things that are like Ospo's or are the same thing as that. And then, but having a single subbing to reference to, just like we do, we have a Linux to reference to, even though it's not the only free and open source operating system, it gives us one thing to talk about and to circle around. So the Ospo kind of serves that purpose in that people know they need something to help center the conversation within the organization and to be the center point for people outside the organization to contact into. And the Ospo can serve that purpose, but it doesn't have to have that name. And it's certainly, it's gonna have different forms and I always, I say essentially that not that no tour like that you really, it really has to start with assessing what your own needs are and what you're actually trying to accomplish and not to form fit anything. As I actually think, we kind of going on to the, what we were thinking about the next discussion was that actually what Karsten was just saying is that this idea of like, there are different organizations. And please, I mean, when I look at it from the, when I look at the educational side and the university side, there are definitely offices and institutions within universities and entities within universities that are doing things that I would be like as the academic Ospo be like, oh, that seems like what we're doing, but it's not 100%. And I think that's funny because how we came to being an Ospo, we were already, we were at part, we were the center for research and open source software, but we weren't an Ospo, we're a research center within the engineering department and we're promoting open source software and we're promoting open source communities around software that's being, or around research that's being developed within UCSC, within UCSC, particularly within Baskin Engineering, where our department is. But we started doing stuff because open source is cool and fun and we started like meeting everybody at all the conferences because we had open source in our name. So we started talking to people and realized, oh, there's all this other stuff we can do. And we realized, oh, but this doesn't really fit in a research center. So we started doing like mentorship programs and working with GSOC and all this stuff. And that didn't fit. So that's when we started to realize that, oh, maybe we need to start looking at an open source approach as well, sorry, open source program office as where we're going. So I like that point that you made about like, they're not always called OSPOS within universities, but they're a lot of different aspects. And I think all other institutions have that as well. And what's interesting is our open source program office is way different than that and why it was formed. So at Fannie Mae, it's a financial institution. And so with that being said, there is a lot of governance that needs to happen around all software, software development, you're in complete software design life cycle. And so we all know that essentially 90% of all applications that we use, everything that we develop is essentially built on open source libraries. And so with that, we need to make sure that there is a level, it's not the most fun, but I'm gonna say a very high level of governance and regulation into the development that we are doing. And we also need to be very, very thorough and strategic into what we do build in our applications because the rise of vulnerability. So one of the big, big focuses that our OSPO looks at are security, like cybersecurity risk and all of these things. And that's kind of right now where the OSPO is. We love and we do encourage community and we work within the open and work within our foundations and all of that, but it does boil down to actually making sure that the software that we are designing that is in people's, for people's lives is secure and risk and stable. So it's not like the research part, right? So it's more of like the regulation and like understanding all of the compliance, which it does, could sound a little boring, but it actually isn't, it's pretty interesting. And Sophie, I wanted to ask you a follow-up question of the, in terms of being, recognizing that you wanted to be an OSPO and what that meant within that, what was the, we'll tell us a little bit about the Sloan Foundation's grants that they kicked off and that cohort of other OSPOs. I mean, because one of the aspects that I had when meeting all of the other leaders of those OSPOs was to see how, again, it was something that came up from within. There was a connection. Somebody was in the library. Somebody was in another department. There was custom built within there, but there was some commonalities and some differences and then all of you acting as a super, or a medic community on that. Well, it's kind of, because we were, first off, it's a really great group. And like I said there, what's nice is that we're learning a lot from each other because we're very different. And yet we have, we see common challenges, but at the same time, our institutions are very different. And even in the network that we're trying to create within the UC system, all of the campuses have their own needs and they're unique. So it's this whole, just because it's in the same system, then it's different. But with regards to Sloan, I think it's kind of interesting because we, like I said, we were cross and cross was started as part of, because of CEPH, the open source storage system that got created at UC Santa Cruz. And then the creator of that project, Sage Wilde, gave us money. Thank you, Sage. And got me a job. I mean, that's why I had a job. And that started cross, and so we had a really, UC Santa Cruz had a very kind of positive attitude about working in open because we have had just a long history of the human genome browser was open source because of our researchers, researchers at our institution. So it was like, it was seen as a positive. So that was, I think, one of the reasons why of all of the UC Santa Cruz wasn't, some people like, really, Santa Cruz? I mean, you know, that's to be honest, we're in California, not Berkeley, not UCLA. But it were like, no, because Santa Cruz had this kind of, you know, inclination to be open, and which was great. And so when we started, you know, already had it being cross and we, you know, we're kind of on the radar of folks like the people working at the Sloan Foundation who were really seeing this approach as a great way of making sure. It's kind of what I was talking about, the idea of leveraging open source, open source communities and kind of the open source approach to helping research have high impact. And so a lot of what we do is also trying to find that research, that high impact research, that isn't necessarily gonna be like commercialized in the traditional sense. Or, you know, where there's an IP or are we gonna patent it or whatever. It's the ones that might have a huge social impact but aren't gonna, or may just fall by the wayside if somebody doesn't step in it and kind of figure out a way of maintaining it. And that's a lot of what we're all trying to do. So Sloan was like, was very proactive with a number of us who had kind of had a good starting point across, you know, was a good starting point from the UCSC side. Open RIT, everybody who knows Steven Jacobs knows, you know, he's been in this for forever. I was like, I'm not gonna age, I'm not gonna date Steven Jacobs but, you know, he's been doing this for a while and he's somebody to learn a lot from. Johns Hopkins had like a very interesting work that they've been doing plus it's a well-known university and we were the first kind of three that started it and then three more came on. CMU, St. Louis and Vermont. So I should know all these because I hang out with these guys a lot. And we actually joke that we tell each other stories. We actually can explain each other's auspices all the time but so I think that and we started a core group meeting and talking and being in all the panels together and each had our own successes. We got to learn from each other. We got to, I mean, I steal stuff from, you know, Kendall from Vermont all the time and they look at what we're doing with industry and some of our mentorship programs and are integrated into the work they're doing so it's really nice to learn from that and not have to constantly rebuild. So we're very open about how we interact with each other and then that success built another, we had yet another call for proposals from a bunch of different universities and I think they had like 40 to 50 universities interested in starting OSPOS and so Sloan actually then funded another six and then so we have this really great group of 12 now that really work together and support each other. So I mean that's, and Sloan takes it, you know, and again it's the idea is like trying to create models that other industries, that universities can follow. But so, you know, the whole idea of it's not doing, that's no one size fits all. So having a lot of different models helps different universities figure out how they can create one so. And it seemed to me that it was intentional on the part of the Sloan's team that does that to put you all together and have you act as a community and because that's something that arises naturally or if you put a little bit of something interesting out here and get everybody to kind of form to it, a little shining something for all the crows to rush over to then. Yeah, the money didn't hurt. Yeah, of course not, but making it a requirement. Funding really helps because it's really hard to get, but now that funding, I mean, there's only so much funding that they have, it's really notable now that we have the 12 and that they put the, I think what happened, especially when the large call came out and this was very smart of them, when the larger call came out, people started thinking about it and starting to see the successes from the other universities and so, yeah, okay, great. If you had a certain once very innovative projects or the ones that got the funding, but the same time the one that a lot of the universities that didn't get it are still going, okay, how do we do this? So it's not like, oh, we're not, you know, we're gonna move on from it. It's more like, oh, we're seeing it becoming the norm, slowly becoming more of a norm within university systems to have an Ospo or something like that. Another interesting thing is that you're actually seeing it from the top down, the Helios, which is the higher education in the open. Higher education leadership in open scholarship. Thank you, I should remember these. They are, so those are President's Chancellor's levels that are also having a lot of the same discussions looking at these policies. And so they're, and they're not specifically about Ospo's, but a lot of it's trickling down in a positive way to kind of the bottom up Ospo approach. And there's another aspect of that too, which is the NSF grant funding sources, and in particular, the NSF started the post grants, the pathways to open source ecosystems around building ecosystems around an open source project. But in broadly, the NSF is beginning to apply the open requirement to research. These are federal dollars, US dollars, they need to be not just public domain, but open source, open research, open data, open scholarship, all of those aspects. And so that's the other springboard, every institution that does research, R1, R2, the various level, anybody who's getting NSF dollars is gonna be needing to have some concept to think about. And this gets also to the supply chain, because I know that supply chain is much more than security. It's much more, I mean, it's about bug fixes, it's about getting people's opinions. Go ahead. Yeah, but I was gonna say this is the supply chain, but is it though more than security? I mean, we have to look at inventory management. This is the fun stuff, right? Inventory management, woo! But we have to look at all of those different facets, too. And I feel like just the supply chain alone and just knowing where your things live is one facet of it. But that does kind of fall in line with security, because think about it. If I were to go ahead and let's say, whatever company that you work for, whatever group that you work for, open up any of your projects right now, let's say all of a sudden you get a zero-day vulnerability, are you gonna know exactly where it is? Probably not, right? You can analyze your supply chain if you're fortunate enough to have that inside of your project itself, if you're able to actually analyze and know where you're looking at, okay, well, that I found it. Well, what do I remediate it with, right? So like, and that's kind of going back into like, that's where it happens. And we also, you can think of questions of sustainability anytime an organization wants to deal with sustainability, that's a kind of supply chain discussion. Anytime you want to be able to carry an idea or a concept or a complaint or anything, a file, a bug, that comes into the same kind of dynamics of being able to reach through and who are your intermediaries and all those pieces in a way. Absolutely, and then, you know, going back to what you were talking about in regard to, you know, being able to spin up all of these different projects within the university circle and then you're able to leverage all of those particular patterns so therefore your groups can be successful and you have that widespread reach. So something that we're doing opposite side, you're open sourcing these wonderful projects, you're building community around it and that's awesome, right? That's the goal. So, but from a financial regulatory side of house, what we were able to do is release a project called the Clean Dependency Project, which was a huge, that was a huge momentous thing from a highly regulated bank to release an open source project with the goal in being the common shared interest of finding these patches for these very common, perceived vulnerabilities of the maintainers necessarily think that they are, but they're affecting thousands of assets within any particular space, any particular project, so therefore you can roll out a patch. So that's kind of what the research that we did to be able to do that, and now it's an open source project, we patched a Python pickle, pandas pickle Python patch? I just like saying it, I'm sorry, I just like saying it, and we patched these two significant patches in Snake Amel that are released in the open that anybody can consume, which you would think from an industry, like, no, this is ours, we're not telling you, no, what is good for the goose is good for the gander, and so then we were able to release that, and so many companies right now are leveraging these three simple patches to remediate thousands of vulnerabilities, and that's what we were able to do from that regulated side, so it's different but yet very similar and building that community. And I'm curious about whether, what else are you finding in terms of the, I know you're involved in the Fennels community, and the dynamic of the whole financial technology sector, like are there, without naming names of course anything, are there many other hospitals coming up? Are these things that people are really, is that a way people are solving problems there, or are they, how are they doing yet? Absolutely, there is a lot of open source program offices or different sort of namings, like people that are well-versed in open source that are coming up in financial industries, because as mentioned before, there are a lot of common problems. At the end of the day, we need to keep all of this privacy information, your money, and all these things secure, right? That's what it boils down to, it is what it is. But you're seeing people that didn't know what they were leveraging. You have these old institutions that have been around for over 100 years, starting to get caught up to speed, and so therefore the subject matter experts, when it comes to governance, regulation, and all these things within open source, as well being passionate about open source. You can't just come in and just be like, you know, compliance, boring person. You need to actually be passionate because the community and these projects is what sustains your business. And so we're finding more and more folks like that that are coming together and just working on these things, as well as in their own job opportunities and things like that. One of the aspects that I find, and I can think of one non-profit organization that I was working with that had, that one of the most difficult aspects that they were dealing with from their internal group that was being, doing hospital-like and community-facing work, and then trying to integrate and work with external contributor communities, is the dynamic of where, what I refer to as the human API, where the individual people that you end up taking on, there's all this overhead to get things done within your organization, and you basically take all that ownership within that team to do all of that overhead, and then any time a community collaboration wants to come in from over here, it has to come up against those processes, and you're always trying to bend the, it's the, I forget the name of the communication, marketing communication theory problem, but this dynamic of where the, when you create a product or you create an open source project at the beginning, the communication patterns match the internal patterns of the organization that created what was there. And so there's these sort of learning things coming up over and over again. And so, and so what I've just been noticing a lot is that it can have a big human toll and have that human requirement, and that there's one of the biggest successes that biggest ways to make success, I think seems to work, and there are people here who could give their opinion on this or not, but as much as you can push your work outside of the work that you have to do to collaborate with people in the dynamic, like you see uses of things like Google Docs a lot now, which although it's not an open source tool, it's open and accessible to people outside of your walls that you can all share and do a thing with, right? And that used to be the way we did things back in the old days was, but the old days meaning, because there wasn't that things that we would use some other techniques or some IRC direct messaging or these ways to share files, but it wasn't necessarily like this as smooth and simple as things are, but yeah, are you ready to take one on? I just had to make a comment about the Google Docs unless you work in a bank. Right. It was like, one minute, yeah, that's so funny, yeah. Right, yeah, and- Two lap tasks. Right, and this is, and so the human, that's a very literal translation layer, like your job is to be, is to do those different layers in between things and even when you have those accesses, I found all the time I would use my external personal Gmail account for things because I could do stuff that I couldn't with my corporate Gmail account even though it was technically, you know, which you may run into that too. I wouldn't mix the two. No. No, keep your, keep that, you know, I wouldn't mix them. That's just my opinion, though. That's actually the true origin of my nickname Quaid and is that it's K-Waid pronounced out loud Quaid, but when I look on a command line and I see K-Waid, K-W-A-D-E, I know that's my corporate account somewhere and when I see Q-U-A-I-D, I know that I'm in open source community space and so I always use one versus the other. So there, that's a tip to you, everybody, why we use nicknames. So what's, what is our next, yeah, the next question, where did we get to? Yeah, I think it would be, you know, something to think about as we move through here. So we're kind of going into, so we have three different levels of experience in what we do, right? You could obviously tell we all have three different perspectives on open source program office, but what it boils down to is we do have the similar, like passions and veins and what we want to do and achieve. It's not just about the almighty, you know, whatever. It's just, it's something that we just, we believe in. And so I think one good thing would be really interesting to talk about is why we chose to get involved in the foundations that we did and those, and those certain levels of importance and how they impact each one of our particular perspective industries. Oh, yeah, I got the mic. I like to share things. So, you know, I've gotten really heavily involved in financial open source, right? This is where I spend the bulk of my time. It's a great community. You are leveraging so many different people that are finding similar patterns and the work that they're doing and we are trying to problem solve within our perspective spaces, which is absolutely fantastic. And so we're talking about, not all places are called ospos and all of these things. There's the open source readiness group is a fantastic space where we actually create together bodies of knowledge that help make development easier and regulation and compliance easier in these perspective spaces. And so that's one of the biggest things that I find to be really, really interesting because at the end of the day, I want things to be easy, right? It's never going to be easy, but if we work together in this particular way, we can make commonality of these patterns, right? Flow together. And then also, I'm just, I love inner source too. And so inner source, if you work in a regulated industry, what's one of the ways that you can develop openly and honestly behind your company's firewall is inner source. So I spend a lot of time in the inner source commons community as well as the phenos inner source commons community too. If you want to learn generally, just about an ospo in general and join some really cool calls, the to-do group, talk openly, develop openly. That's a great space to get started. It really is. And everyone's really welcoming. Yeah. I want to add one thing about the inner source commons group. It's sort of under the radar, not everybody necessarily knows it yet. One of my favorite things about what they've done in terms of, correct me if I'm wrong on this, but in terms of creating patterns or anti-patterns or things, lessons to live by is groups that have been involved in inner source are able to anonymize their story of some horrible thing that happened to them inside of their company that they can't tell to the outside world, but if you anonymize it in such a way, little chat-in-house rules kind of thing. And so you get these, because that's one of the great things about open source, it's like we can actually talk about all of the horrible shit that happens because it's right there and we can really point to it. And when it comes to inside and like you've got that done in Excel, I just wanted to, so it's good to, not everybody's going to be comfortable out here. It's good to have different ways to, yeah, chat-in-house rules is really important. Chat-in-house rules is great. That's great. Having come from a former career where we worked with, track one and a half dialogue where you actually had like government, the government's talking with a bunch of NGOs in the room, you're all like, yeah, chat-in-house rules is really good. So, I was gonna say, so from us, I think for me, I've always been in academia even when I was working on different issues, I was doing it from like a think-tank perspective attached to an academic institution. So it was kind of natural for me to stick with this. And I came here, I mean, I came to open source knowing nothing about open source and except for open source intel, which is what I used to work on. So that totally different as I knew at the time. But I had seen what open source could do in that vein and like looking at the analysis and the type of work that I was doing, I saw how powerful open source was. And just long, not my whole life story, but for reasons that were really important at the time, this position came up and I swapped universities. But what we were seeing was a really, I alluded to earlier, was that a real issue within academia and I think research anywhere, but it's particularly in academia where you have PhD students working diligently on their dissertations and then finding jobs and their dissertation topics kind of go over here. And all this work that they've done is not really curated or kept in any form that actually can necessarily be replicated or built upon or that it's not gonna go forward in and of itself. So the idea that a lot of academics are seeing, I think, and why I find it important that we have academic auspices, is that idea of being able to promote those and not allow that to happen because it's a lot of reinventing the wheel. Not just, so it's not just something that's bad for us from an intellectual perspective, but it's also like a financial loss. And this is one of the things when you were talking about NSF, the National Science Foundation getting into it, I think a lot of that is a recognition of the fact that we are losing money if people keep having to do the same experiments over and over again because it's not being published in a place where it can be found and people can't build from it. So that's a lot of what I really feel is going on within the academic space as well. Like what's a really funny thing to your journey to open source? I used to teach kids, right? I used to be a school teacher and I just somehow just fell and that's what's so accepting and amazing about open source software is that it can lead you in so many different directions, right? And so there is, yeah, there are parallels, academia, but my kids are a little old though, but they're still cool. We launched this, we put that link up for intersource commons. Oh yeah. Well, I forgot about that. Could you change it on the slide? Yeah, but did you have to reload the slide, I think so. So what was the question? The foundations that I can involve. I'm not a joiner, I guess I haven't really, yeah, I just, I really do, I've been, like you can, the things I hang out with, I don't know, I only get so deep. My, I worked for a couple of decades for an open source software company that many people are familiar with called Red Hat and 21 and a half years, something like that. And in that, I got a chance to just sort of help capture a lot about best practices and but also hear a lot of stories from people as they were coming through the company and like, oh, that's okay, what do you got going on? Good dog, good dog, really, okay, good. Oh yeah, what's that time like? And also a chance to really speak. Thank you, one minute sign. And a chance to have a lot of conversations over the years with people constantly and regularly in a space. And what's great now that in my new position is to be able to actually turn those conversations into something that's more, whether I turn it into somebody being a customer or a partner, or a project that I get involved in, that's what I had to get doing. So I'm working with the Kauai Foundation right now. It's a nonprofit and it's about personal AI and because it's the right time to work on that thing and they can use community management help and it's good to do, right? But I think the, and of course, the opensourceway.org is a community of practice that awakens every half decade and updates a book and then goes back to taking care of our own things. The thing that I've been most compassionate about the last couple of years though has been getting, just in terms of my personal stuff, right? Like what do I really wanna spend my time on? And so like I've been, like I got involved in the inclusive naming community, inclusive naming.org and all I did was, all I do is like work on words and help make sure that we get this list released and stuff. I'm not trying to lead the project, I'm not trying to, there's so many other people to do that and there's so much, it's just, you know, I'm a word nerd and it's fun to do that. Simple. You know, so that's, so I'm an example of someone who really spends my time, when I wanted to put things in its program, something I care about more than something else, because I was also, had the advantage of being on Red Hat's Oscar team for a long time where the people like Brian Prophet was interested in, like a natural person that go off and work on the chaos community and so I just, it's basically, Brian had it all, right? Like there wasn't, you know, so there, and so that's an interesting dynamic that can come up when, even in a relatively small osprey where people fraction off a little bit and stovepipe and don't cross-share information with each other and so it has a little bit of a challenge with that. But what I wanted to just to say was that I've basically become more passionate about getting community management people together. Ospo people are part of that. Community managers, whether you're working on a product, whether you like support, like you're supporting forums for somebody's favorite, somebody's, there's product communities that go all the way like gaming computers or anything like that. All the way through the communities of interest and communities of practice and things. Like I think all this stuff's a bit continuum. And just, and this gets back around to like why we suggested and came here was we were having, you know, I met Brittany at the end of last year and I've known Stephanie for a little while and what I kept, like with Stephanie and now it's with you as well, like I kept turning around and I'd go talk to friends and they'd be, oh yeah, I know Stephanie from this or from that and like within a very small amount of time you'd found yourself completely within the right places within this world, right? And seeing that happen over and over again, like that's a natural thing that kind of arises as well. And so that was what I just really wanted to have a chance at, you know, to, because in essence I expected we're gonna say more of the same thing than different things with these experience levels but I also appreciate that we can from, have a lot of different range of things that we know about, so. Yeah, before I give it back to Stephanie I just wanna say thank you so much for celebrating community management. Community managers, it's a really in my mind a misunderstood role that I don't think gets enough credit, right? Community management has a community manager especially with an open source have their hands in so many different things and they should be celebrated. And it's not just, oh I'm here, oh you're a maintainer of a project, cool fist bump, no it's not like that and it's actually a lot of work so I think that more people need to get, like people should be normalizing it in enterprises and hiring more community managers. And I'm gonna put a plug in then because I just recently last year got involved in the CMX community which is a community managers community but they're more related, it's been the people who've been working on product side but what's happened is they've convinced the people in the C-Suite that they matter, they have representation at that level because they can drive metrics about how they help. But ultimately they're doing the very same stuff that we're doing fully on the open side, it's just, it's same stuff different, it's different shifts about other things and they're also at the points of wanting to have the self reflection that the open source communities have had for a while about how to take care of ourselves and check our own practices and help teach each other and so forth. So it's a great time to bring these different things that groups that thought they were different but have been calling themselves, oh, here's another one I learned recently, our past one, in HOAs, the people who are community managers and their job and when I listened to this to what their life is like, I'm like, oh my God, it's just like a job. Like they have to deal with the board of directors who are volunteers, who are home owners who will have a lot of expertise and opinions and in many cases, retired and they're not paid very well, they come from non-traditional backgrounds, they don't necessarily have a degree in something community management related, they're doing all of this stuff and their job is to be the interface with all the people to learn all this stuff and then make sure, and it's just, it's just like, oh, and it's the same stuff just a little bit different and so it feels like, yeah, we'd be celebrating and getting these groups to have an understanding, a greater understanding of the thing that we do, how similar it is. Yeah, yeah, mission in mind. I was gonna ask one question of the group and I think then we're gonna open it up for questions. Okay, so I, and I, so I, like I said, I come from, again, non-traditional background, I worked in non-proliferation for decades and decades and then, but part of that I worked in, I worked a lot with export control. Hey, compliance people are awesome, by the way, so you were making them a hard time, but compliance people are the best and what I found really interesting when I was in that field was that compliance people work together across the industry even if they're in competing organizations and that really impressed me because, you know, having come from academia where everybody kind of, like, often, you know, can be very competitive, that, you know, that level of collaboration was, you know, really noted me and now that I'm in the Ospo and the open source field, I've actually noticed that amongst Ospos as well, like, the do-do group has people who are, you know, from competing industry and I think that's pretty, very impressive, so I kind of am trying to figure out what you guys have and if y'all have, like, a view on how, like, what is it about Ospos and people that work in Ospos that kind of make that happen, why that kind of seems to be the norm because I have my own ideas about why that's important in an environment like this, but I can see what, especially in, like, the FinTech but also in what you, sorry, in the financial area and then also what you've worked on as well, like, why is it, why is it so, why is it so, why does it seem like Ospo, those working in Ospos are so intent on collaborating with one another? Yeah, yeah, I was, I mean, my short one, I think, you pass it over, I got a mic, I'm already there. I was just gonna say that I think that there's, it's not that everybody in an Ospos is a connector person, just like, not everybody's an extra virgin, gets on stage and talks, right? But the, it's like, organizationally, we need functions that create connections and network people together. And, yeah, so there's definitely a draw towards, that's sort of the feeling in my heart that I think is it's like the, like the sort of thing like where I'll get up on stage and talk to people because it's important, not because it excites me or makes me comfortable or anything else like that, right? So, yeah, that, does that make sense? The connector person stuff, does that a good answer for what you were asking? Yeah. I would say so, I think that that makes sense. Yeah, so you don't have to be a connector person, but a connector group and then people want to be doing the job of helping connect together and make things happen. Yeah, and I also think that, did I give compliance people a hard time? Okay, good, I was like, no, I love compliance. You were saying, everybody talks about compliance. Right, you were saying other people give them crap and we're like, no, we're all with you. Yeah, when you say compliance, we'll usually tune out and pop open our laptops and start looking at their phones. I mean, it is what it is, it just, it is what it is. But I care about it. But no, I'd say anybody that works in an open source program office, yes, definitely, you can be introverted, you can be extroverted, you can do all of these things, but what I find, as mentioned, as they started as a school teacher, and I just kind of fell into technology, what I find to be so interesting about open sources, the people that fall into it are the ones that believe in the mission, right? Understanding that software isn't going anywhere, hardware isn't going anywhere, there's all these different connections that are happening to it. There are all these people that are touching all of these different facets, everything that we run in our day-to-day lives. And at the end of the day, I think that people that work in the open and especially in, let's say, the traditional space that governs and works within open source, they are actually passionate about it. It's not about the bottom dollar, it's not about the end of the day, it's actually just doing something that's cool. And that's what I think, and I think that's the connection point of it all. Yeah. Because it matters. It matters, it's really important. You know, there's a phone there, there's a bunch of applications on it that are all run in the open. Look at this thing all the time, slippery germ brick, you know, but it's a lot of it that runs in the open. Well, I realize for me, what matters to me often is that it really, it's really important to me that people have a chance to get access to what matters to them, right? That that's been one of my highest things that matters to me, which is sort of like meta-mattering or something. And then academic, academic, you know, you have people that all across the world can learn how to code, can like make things more accessible through code, and that's in the open. And I think that there is probably, when people believe in open source, that common understanding that it does just make things a lot better. And we can just keep doing cool stuff. Well, so let's jump to some questions. There's any questions from the floor if we had any questions we wanted to see ourselves? Over on my head, yeah. Sure, grab a moment. You're a fan of me. Yeah. Two years on Wall Street. Okay. And in all that time, basically, in all the institutions, forget about purpose. Mm-hmm. In my hands, in my hair, they say, you know, I watch things come and go and die. Using it is almost non-existent. And I was wondering, does family matter actually use open source? Yes. So the question for those out there that didn't have the microphone, the question is, from a financial regulation industry, we've seen open source software grow, die on the table, right? And that happens a lot. It really, really does. And now, I would say, not only does Fannie Mae use open source software, I would say everyone uses open source software. And you have pay vendors solution. Oh no, that's not even possible. Yeah, that's interesting. I think that the time's up. We also don't do our own custom pavement blends for our driveways when we want to get our driveways done too. I think the vendor-supported solution, well, you look at the vendor-supported solution, you think of it, so I can take this package, I can pull it in, I can develop it, it's gonna be stable if something goes wrong, that's on them, right? That's one way I think a lot of different spaces look at it. How about this? In the state of California, we have a, what do you call that, the cottage food industry law. So I can have my kitchen certified at home at one or two levels that will allow me to do all kinds of things. I can do all kinds of things that I can make and do stuff out of there, but what I can't do is anything that's gonna kill people, right? So I can't put something, it can't do anything, it has to be shelf-stable, and I have to be able to make it shelf-stable in a way that matches, or at a certain point if I'm gonna be canning something, I'm now offering commercial kitchen space, right? So there's a line in which those regulations come in because what we've learned in general is you probably can make some jam at home that's gonna be just fine, you're probably gonna be fine, you can't commit, but you can't do canned meat at home. So we let you do canned jam in the college industry, but you can't do certain things that are more dangerous, right? And I think, and so when it comes to, like anybody can take a recipe to make a muffin, right? And so, but what you wanna do is you wanna make enough muffins to feed all the school kids in all of the different counties across the whole entire Bay Area, then now you're in the area, that's what I mean about the differences of making your own versus not. Like there's a point where you're saying, we're going to take on this task of making all of these muffins become a bakery when that may not actually be what your role is. And there's somebody who does a really good job with that and they can make it in a way that's safe, consistent and will supply you with your deadly muffins. Yeah, and I can't speak for every single financial industry out there. Who knows? Maybe there is the dinosaur that's out underneath the bowl somewhere that's just making all their own solutions and not putting them out there. But to be speculative, I highly, I would say so. I would. One day I would work for him in the only place where we could pretty much sell sides, certainly. The buy side, you know, is a little more flexible but it's still not. In fact, the buy side didn't even take the sort of fear factor aspect of it. They took the factor that it was some relationship with whatever it was like, almost all of it. And for that reason, you know, you just saw, you know, that's where all the things happen. And relatively good things happen because, you know, sometimes we would build, you know, products, look, yeah, I should be coming in and recording. No, you've got insight on the rest of the class. Yeah, there are products that, you know, actually could and would have, you know, really effected, maybe they'll always be in the industry. Well, Dwayne, did you have a, did you have a con, did you, we're gonna open the phone and actually I'd like you to make a comment, not just a question if you'd like to. I was just gonna ask you about Fannie Mae. Well, that's what- And I can't either. Well, so wait a minute, okay, hold on. So you're completely off mic right now, so I'm gonna hand this to you. Yeah, yeah, thank y'all. Hello. So I can't speak to any other companies because they don't work for them, right? I know which companies that I work for in the financial open source group and the Phenos open source readiness group, but I can't, I can't speak to that. Yeah, I was just curious. Yeah. Yeah. It's interesting. Yeah, I mean, I suggest that they should come and join us at Phenos and learn more about how it's not that scary. I guess I did have a question or comment then. Like you, if my recommendation, if you want to dig into the space would be to go to GitHub and search for financial service companies because there are a surprisingly large number of them who have released software. No, I'm encouraging you to go. Yeah, yeah. Right. Right, yeah. And I've not heard anything. Justin. I understand what you're saying, but my sort of summary comment to the whole thing because clearly Brittany can't talk beyond her own company. I can't speak to it as well, but I think if you go and do look on GitHub for some of those financial services companies, you'll see which ones are using it because you can see them active on GitHub and releasing software. I wanted to push another question to the panel though. Which was you had set up the discussion around sort of your three levels of experience working with an open source programs offices. So I wanted to recommend starting from Carson and going to the left. What is the insight looking back on your 20 years that you want to pass to the person on your right that you wish that someone had passed to you when Stephanie was at Stephanie's point in Stephanie's career? Okay, good, thank you. I think the two key things that come to mind which is something that I sort of whispered a few times which is that it's more than just one size doesn't fit all. It's that you really need to actually look in and do the individual work. And I think that the work you're doing with UCSE, with the UCs in general and dealing with those as individual groups knowing that even though they all have the first two letters initials of their acronym are the same, the last ones are not the same. And if you know the difference between San Diego and San Francisco and Santa Cruz, you might know the schools are different. And I think the other thing is to realize that everything happens in cycles that are very similar but they're different depending on which organization or industry or thing you are in. So businesses are often going kind of quarter over quarter the business week, five days a week. These different aspects are going with that. But there's also how long it takes to get an initiative within a company going from the moment you do it to go. For example, I've noticed how many friends have started doing an open source program office or something like that at the company. And it's not until 18 to 24 months in that they've got enough traction and have made enough relationships to actually start to make something happen which can be really frustrating if you think you're gonna do that from the beginning. So part of this is that all of these different groups have different timelines as well. Academia and research might be like four years and eight years but in some cycles but also it might be having to tie into funding grant sources and how those in timing are going. Governments have these sort of like these sort of three, five, 10 year arcs that happen. But I definitely noticed that almost anything that's worth doing that's gonna take a time and energy and it's gonna be like let's create a whole new way of a platform or something. From the moment you scratch it on a napkin to the moment that you're like, oh, all right. You're looking around at a conference of 10,000 people doing a thing or whatever is about 10 years. Maybe eight years if you're really fortunate can accelerate it. So just know that when you wanna dig in on something that it's really, and that you don't have to be there at all of those steps in the way and the stuff but you have to be thinking about where you are in those steps and be creating content material and carrying people along and knowing that there's a general arc that happens and there's a lot of us out here who have been, so now you can think about it. Anybody who knows, been in something for about, this thing for about eight to 10 years, they are a very experienced person to help you have that broad view and understanding. We're funding our industry as long enough that we can actually start to have that five, 10, 15 and 20 years experience. Oh. Yeah, I do say so. I have not been in open source as long as Carsten so that was a really good idea. Like I would take a lot of what Carsten says for sure but also for what we were doing that having that pilot that we did was super important because it means that a lot of the, so what we're technically doing is what we call a pilot project and that all of the UCs can now kind of look, so all the campuses we're working with specifically are now looking to us and then the idea is like, oh great, you don't have to do those for last, you don't have to redo those last two years. We're here, we can help you move forward and I think that's true for a lot of the, a lot of projects that are, people that are thinking to do open source projects is that first what I really would not think back to, even when we just started as Cross, which wasn't even, we didn't realize we were kind of an Ospo, was that we didn't start thinking campus wide, let alone system wide, but we didn't think like, oh I need to go outside of engineering, oh I did. So having, starting that discussion and understanding who your stakeholders are and what the landscape is, is really, really good to start out and like we're just kind of thinking about it now, which I think is way like we, really kind of knowing what all the other open source, what everybody's doing on campus that touches open source, having a better idea of it is really important because I think it gives you, when it creates more of a community and more of a community of practice, that you can really work with. And so that would be kind of the thing I would say to, and I think you already kind of talking to that with regards to the financial community, is like having that community of practice that helps you kind of do the work that you're trying to do, I think is really important. Yeah. Can you talk with the imagine here if there's a difference to talk? Okay. And I just want to, Stephanie's note reminded me of something that I learned from her the last couple of years, so it reminds me that the stuff goes backwards as well, that like my, for a long time when I first engaged with Cross, I traded it like it could do things across the system. And I didn't understand for the first over years that I was talking the wrong language. And I learned a lot last year when I actually got to focus in it seriously and try to understand. I was like, oh, I get what's understanding here, like it's not the same kind of thing and we have to rethink with each kind of industry to some degree how the stuff works. So thank you. Not to much, the lessons go the more direction. So I'm the new of the group. I've only been in open source six years. That's it, you know, I like, yeah. And so what I've learned and how I've been able to just garner as much information as I have is I listen to others. I pay attention. I threw myself into the community, basically headfirst. I actually believe in the mission and I sought help when I need help. And I listen, the biggest thing is that I listen, right? And I just take it as much as I can. And I just don't give up any sort of passion for it because I don't know, it's just cool, it's a cool job. And I like it. And so it's just that, so yeah, being the, and this is it, like, I'm just like completely humbled to be in this for such a short amount of time and to sit here with these two wonderful people. And it's just awesome. So thank you. Yeah. I'm sorry. Oh, we have a couple minutes. One minute. One minute, one minute. Is that right? What are you actually, I'll let him record it. I didn't get it. I know. We might get a hint of you. So I might have died. Are you going? Yeah, I think it died. You want to check grab this one? This one's too wide. Yeah, I think that's mine. So I'm curious, what are the biggest security concerns that we're seeing in the open source space right now? That's, yeah, that's a longer, that's a long discussion. That's a whole different talk. I'll see you next month at Open Source Summit. So I'd say one of the biggest security concerns that I see right now is that we cannot identify where everything is, even with your software bills, materials, and all of that stuff. We can not only identify where everything is, but now we can start pinpointing it down. What can we remediate it with? That's one of the biggest things that I'm seeing is a trend there. That's why we started the Clean Dependency Project because a lot of the times, these perceived vulnerabilities, the maintainer's like, no, it's not. And you're like, oh, now I got to write a wrapper for this and put it a patch on it and then consume it. So that's one of the biggest ones that I can see. And then also, people just not upgrading their specific versions when they should be. It's just honestly, just giving people, our technologists and our engineers, the time to be able to do things the right way as opposed to just consistently shoving everything out the door to develop faster when they need to develop smarter. It's not their fault, but. The fight went on. And then a quick one from the academic side before I let Karsten have the last word is what I was talking about, projects that somebody did that were great. And then they graduated and moved on and people are still using them and nobody's maintaining them. And the fact that also in all of that, education-wise, I'm not seeing it enough in even the classes that we teach and we actually are kind of taking it seriously. So CS, people coming out of CS or any area where they're actually gonna be creating software need to have this as part of coursework and understanding of it. And I don't think it's there yet. And I think for me that what I was thinking about and sorry to be so jumping on it, but it's just that the concern about nation-states and the nation-states' own opinions of that and because the internet is an international thing but it has a sense of like countries can try to close off borders and control dynamics that are going on there. And I think that the, and I'm not making up, I'm actually, thank you, Ava Black for making me think about this. But the dynamic is that we, when people are spending like our supply chain concern is about somebody who's, the individual who's been working on a piece of software and that's like the dynamic is the wrong kind of dynamic, right? The reason my open source works in any ways and secure open source is secure is because it's of the process, not the people and that we need to be focused always on improving that process and D, I mean it's a fundamental value that we are able to have pseudonymity, not anonymity but pseudonymity with an open source so that if a person has any reason to mask their own identity and be pseudonymitous that they can build trust under that and be valuable member of the world community and for whatever reason that they're doing that, right? And that kind of thing is something that has me concerned because nation-states can also try to use software to do things with it and they're doing that stuff already but it's really just the dynamic of being able to provide actual physical threats on people and then physically cutting off entire sections of the world from access to information and ideas and the ability to do things so. And then poorly written code in AI. Okay. Yeah. Oh, wow. So that's gonna bring us to the end of the session and the time we got one minute's done. I was gonna put a link down to the, if you wanted to reach us, we all put our LinkedIn pages up here which are impossible to actually read, I'm sorry, but basically just our name can be searched on LinkedIn and you can find us. That's the best way to get contact. And yeah, that's it. And we're gonna head over to, I think we're gonna get over, oh, thanks, and I had a slide up here. I was getting a slide for going over to SidePie for dinner tonight, but we're gonna, are we still gonna go over and have dinner? Yeah, I won't be done. Okay, so we're gonna go over to SidePie and have pizza if anybody wanted to come and do dinner tonight up in Altadena, we'll be going, trying to get there by 6.30 or so. It's about 10 minutes drive away, so. Thanks, everybody. 6.45, okay, change for, thanks, everybody. Thank you. Thank you, y'all. Thank you for the laptop. You said you were leaving. No problem. Of course, all my notes were on this, I was like, oh, don't forget your water. I did, yes, now, no. Oh, wait, wait, yes. No, no, no, no, no, no, no, no, no, no.