 Across the pond, Greater London, their historic environment record uses arches to both house their historic inventory data, as well as power their archaeological consultations process. Across the other pond, Auckland Council in New Zealand is in the process of implementing arches to manage and publicly share information about heritage places in Auckland. In fact, arches has been used worldwide by nearly 100 organisations since we started in about 10 years ago when we launched our version one. And so this shows the flexibility of the platform that it can be used by different, these different organisations all over the world, but it also speaks to its usefulness. So let's learn a little bit more about the platform itself. So arches is software that is purpose-built for organisations worldwide to independently manage their cultural, well, to manage their data for the long term. It was originally developed by the Getty Conservation Institute and World Monuments Fund for built cultural heritage inventories, but the platform has expanded to encompass so much more. At its basic level, arches is software that operates under an open-source licence for the code and is freely available, meaning there are no licensing fees and you can have an unlimited number of users that's obviously dependent on how you implement it, but the arches software, it's enterprise-level software that you independently host on your own server to your end users, so you deploy that independently. And I mentioned that 100 organisations number that we know of because we're not, you know, we don't have anything tracking that. And so also arches is supported by a growing community of domain specialists, mostly at this time cultural heritage data specialists and technology professionals. Beyond that, arches includes many different features and capabilities. Now, this graphic highlights some of the features and I am not going to go through all of them, but do know that arches is an all-in-one system that allows you to manage your data, manage work processes involving that data, and publish it with advanced discovery and visualization tools to whatever audience you want. Now, the features I will be going over are the ones that I believe are of most importance to managing data in a government environment. So the ability, number one, the ability of arches to control access to data. It's flexibility to accommodate different use cases, both within and outside of cultural heritage. The platform's ability to generate data that is interoperable and has longevity independent of the software. And the last bit is that we have aimed to build a platform that can be implemented by the majority of organizations for all of their end users. In regards to controlled access, arches includes permissions controls that essentially control who can see and do what to what data. And these permissions controls are configurable in the administrator interface, both the Django interface and the arches UI. And also, arches just launched the capability for a single sign-on authorization with arches version 7.3. Rob is gonna talk a little bit more about that later. Another important aspect of arches for governments is its flexibility. Different governments have different laws and that can affect how important data is managed. For example, I mentioned LA and San Francisco both use arches, but their data models or in arches parlance resource models are somewhat different due to differing historic preservation programs. And arches can accommodate this in two ways. Number one, arches is configurable via an administrator interface called the arches designer, which defines the database structure and dynamically creates corresponding data entry cards without any coding. And number two, arches has a modular architecture to facilitate the customization and extension of the software. So arches can be used for so many different types of data and purposes. For example, monitoring the real-time environmental conditions of the Magalgrados in China, creating a repository of 3D models of historic sites, using satellite imagery to monitor threats to archeological sites in the Middle East and North Africa, recording information on both tangible and intangible heritage, as well as globally significant biodiversity, serving as the hub of a citizen science crowdsourcing project to monitor the effects of climate change on archeology in Florida and even managing and visualizing science data as it pertains to heritage science. Now, this particular project exercises both the flexibility and configuring because their data models are going to be very, very complex and different and the modularity for extending arches as we have had to build different kinds of extensions for this project. Again, arches is flexible because it is important to be able to accommodate all the data that you need to. Now, equally important is that the data that is created and managed using arches outlives the software itself and is available for future generations, future decision makers, whoever needs that data. Ultimately, it's the data that matters. And so arches has the ability to create findable, accessible, interoperable and reusable or fair data. Now, some of the key features to support this are number one, the use and management of controlled vocabularies. Number two, it can encode data with semantic metadata and the source of that metadata can be any ontology that you choose. At the moment, arches comes preloaded with the CDOC CRM which is an event-based ontology in the cultural heritage world. Number three, arches also supports the export of data in JSON-LD. So here's an example of how the Swedish Institute at Athens is organizing some of their archaeological data, their actual archaeological field data, using the ontology, the CDOC CRM ontology, and all of the data in that instance of arches will be encoded with the semantic metadata that you see on the screen. So anyone entering data into their instance of arches will be automatically generating rich self-describing data. Now, the last feature or key feature that I think is important is inclusivity. So generally when I talk about inclusivity, in regards to arches, I'm talking about the platform being available and usable by different kinds of organizations. But often that also leaves the platform being inclusive for individual end users. For example, we are currently working on creating accessibility controls for the organizations to implement for their end users. And this is the same for languages. More than 30 languages are currently represented in different arches implementations. As of arches version seven, the platform is fully internationalized, courtesy of funding from the Arcadia Foundation. And this gives organizations the ability to translate and localize their arches implementation regardless of the number of languages or the directionality of language scripts. So a government or any organization can accommodate whatever languages they need to. Arches can be used regardless of where in the world you are, even remote places with penguins. So for example, implementation is in progress for South Georgia Island, for their cultural heritage sites. But the point here is that an end user can access an arches powered system anywhere they have a network connection depending on permissions controls, of course. And arches can be used regardless of what the scale of your project is. The Armed Forces Retirement Home in DC manages the historic resources of a historic government campus. It's just a small campus, but it's managing that as well as their conservation management. And this implementation of arches is for the, it's for Canton the Bon. It's a rural area of the Ecuadorian Highlands with a large set of cultural and natural heritage assets. Showing that arches can be implemented regardless of what language you use. They've implemented this in Spanish, where you're located or the scale of your project. So I touched on flexibility and customizability, controlled access and the fair aspects of arches. And now we're going to break down some of these concepts a little bit more on the technical side. And to do that is my colleague Rob Gaston, he's senior developer at Feralon Geographics. And they're the original and currently the main developers of the arches platform. So... Thanks, Annabelle. Sorry, I'm gonna put this here because I need it. So like Annabelle mentioned, I'm gonna go through a little bit more of the technical side of what arches is, what applications in arches look like and how you can customize them and use them to manage data. So to start with, I am showing here a little diagram of what the arches application looks like at a really high level. So arches uses what I would call like a typical web application architecture built on open source tools. We use the Python based web framework Django to create the UI as well as APIs that arches provides. All of the data are stored in PostgreSQL on the back end. We use the PostGIS extension as well in PostgreSQL pretty heavily and that's how we achieve high quality geospatial capabilities. And then data are also indexed into Elasticsearch from the Django application and these indexes are used to drive arches built-in search interfaces as well as services in those APIs that I mentioned before. So Annabelle had mentioned arches resource models. Arches is like Annabelle suggested, it's really a platform for building applications so every arches application can collect data into its own custom business data models that are defined or can be defined locally to that application. Those business data models in an arches application are what we call resource models. So what you're looking at here is sort of the index page of an arches application and the resource models that have been designed in the arches designer which is that admin UI that Annabelle mentioned before. Arches resource models are hierarchical semantic models meaning they can be based on a defined ontology like the CDOC CRM that Annabelle had mentioned earlier which is used heavily in the cultural heritage space. And these models are actually created interactively in the user interface and that's in order to provide a way for domain experts who aren't necessarily experts in you know, database tools, relational database tools or the like to actually design semantic business models that meet their requirements for particular data management effort and to design their data model and related user interfaces without the need to write a lot of SQL or application code. And to achieve this, we have a schema, we use a schema this back end in Postgres SQL. So this is the arches designer UI and I can make this work, there's a little video here of it in action. Arches resources are designed in the designer as semantic graphs of nodes and edges and this graph structure is how we support enforcing ontologies, it allows for design flexibility, complex composition of your data models and interoperability with other systems and that's supported primarily by the JSON-LD export slash APIs that allow arches data to be represented outside the system. Arches resources, resource like instances of these models can then be freely semantically related to one another, again enforced by that ontology. And the whole point of this is that semantically related data with an enforced ontology and support for JSON-LD or linked open data allows arches to represent data in a way that is expressive, interoperable and can outlive the system. That is those data can be understood without being presented in the context of the arches application. So let's look at that a little bit closer. Here's a real simple resource model that's been defined in the arches designer. The section we're looking at here defines the semantic hierarchy where a location has a geospatial feature with an associated accuracy node which is a number in meters and so that's represented by that M in parentheses next to the name accuracy in this design. This is meant to represent a place and a place also has a name in addition to the location. This is an example of how you might modify a model in a simple contrived way. So on the left you can see the place with the accuracy sort of hard-coded in meters. On the right we've added a node to represent the unit so you could assign the unit as meters or centimeters or et cetera. So this is just meant to show how you can edit resource models in the designer and here we can see what that actually looks like from the perspective of a business user in terms of data entry and what the data look like actually stored on the back end down below. So this is a view of the resource editor which is the tool in arches that you use to actually manage your business data. You can see the differences and those models that I described on the previous slide here reflected in both the data entry forms that arches automatically creates from the models as defined. You can see on the left the accuracy is just hard-coded in meters. On the right you have the unit field that is populated with meters. And down below you can see the JSON document that we're actually storing in arches. So when I mentioned that arches uses a schemaless backend and Postgres SQL what I'm talking about is using JSON B so that the schema for those JSON B documents are defined by the models that are designed in the arches designer and we use JSON B to actually store data in a schemaless way and so you can see on the left the JSON document without the accuracy property included and on the right you can see that included under the circle there and you might note that the keys here are UUIDs and you'd be right to note that. And you also might note I guess there's a geospatial component here so there's a map there that has a point drawn on it might be kind of hard to see the little blue point towards the center of the map but those point data are actually represented in our JSON documents as GeoJSON, GeoJSON feature collections and so we'll get a little bit more into how we use some of the GIS features there but we don't actually store those as post-GIS geometries natively though we do index them into an SDN post-grass that actually stores them as actual post-GIS geometries in order to use geospatial indexes really. So now I'm gonna get a bit more into how you can actually customize the arches system beyond the built-in designer tools. So I'm gonna talk about arches data types first. Arches data types are not the same as data types in Postgres SQL even though we use Postgres SQL to store our data. All the data in arches are stored as a JSON B in Postgres SQL like I mentioned and arches data types define the JSON structures and the business rules and logic for things like validation or indexing on the back end for specific use cases. So data types are and related UI components in arches are modular, arches supports data types that you might recognize as like literals and things like that but you can also design custom data types to support very specific business requirements. So an example of that in use is this custom data type that was created by Historic England. So Annabel mentioned Greater London and Historic England is implementing arches to manage their cultural heritage data. The British National Grid is a local geospatial reference system it subdivides Great Britain into 100 kilometer by 100 kilometer squares. You can see a little view of that there. It's useful for locating things in Great Britain and but it's also important for doing geospatial analysis for them and so because of this Historic England wanted to attribute all of their records with a British National Grid reference information record and to achieve this they implemented a custom data type in their arches system. So this is a view of the actual UI that they built to manage that data type. So the custom data type is really the back end logic it provides handling of values for things like validation, indexing, search. There's complimentary front end components that can be built for collecting data or reporting on data or visualization. So here we're looking at what in arches parlance we call a widget to support the British National Grid reference data type. So widgets in arches are the UI components that are meant to handle the entry of and reporting and visualization of a single value for a given data type. So in this particular view the form only has one value in it but the widget is really just about managing this single British National Grid reference value. But there are a number of other kinds of UI components beyond just those sort of atomic widget level components that arches supports. In all cases arches includes some generic default components but they're meant to be modular and they can be developed for very specific local requirements or for broadly applicable use cases and when appropriate these components and data types and even resource models are meant to be shared and provided to the arches open source community as appropriate. So the business data in arches again are stored in PostgreSQL hierarchically related JSON documents. As we saw with the British National Grid reference widget widgets in arches work with a single value so that's a single key in one of these JSON documents for example you see on the left the top key with the string half moon bay in it. We have another kind of UI component which are called cards in arches and those are meant to support entry and reporting for groups of values that is like an entire JSON document in this hierarchically related set of documents. So the generic cards in arches really are meant to support like different kinds of form layouts. So in this example video you see this form has a map in it and if you change the card from the default which just sort of lays out the widget sequentially to this map widget you get a sort of a larger or map card excuse me you get a sort of a larger map view that's maybe a little bit better for geospatial data management in certain cases. By default the arches cards are very generic so they're meant to just kind of render all of the widgets that are present inside of that document. But arches cards can also be quite custom to a set of local requirements and for managing related values. So Annabelle mentioned before the Armed Forces Retirement Home Project in Washington DC this is a custom card that was built for their arches system and what it allows them to do is relate potentially impacted resources in the field to management activities based on a geospatial location. So at the start of this video the user defined the location of some kind of management activity if something is happening in the field what other resources are there and then I can quickly add those to another value in this card via that geospatial query for other resources that happen to be in the same geospatial vicinity. Plugins are another type of custom UI component in arches. So plugins are really meant to allow for the creation of entirely new pages in your arches application and frequently what they are are a composition of many of the other arches UI components that I was describing before. This is a custom interface. This was developed for a local utility up near where I live in the East Bay and they are using arches to track business sewer services and the related charges. So it's kind of a left field use case given the cultural heritage history of arches but what their requirement was to be able to calculate fees from a bunch of these sewer outflow and charge records that they were creating and calculate the fees for a given fiscal year for a particular business and be able to modify some certain values within those data in order to see how those affect the final calculated fees. Another common way we've used dashboards plugins in the past excuse me is to build dashboards because plugins are quite open ended in their nature. This is an example of a dashboard that we use to aggregate some data in arches in that same implementation that I was just describing on the previous slide. Here we're using Kibana which is an extension to Elastic Search. It allows us to build this dashboard directly against our Elastic Search indexes and just render it in arches as a plugin. So what you're seeing really is mostly Kibana but it's just rendered in an arches plugin and it's using the arches data via Elastic Search. Another sort of special class of plugin in arches I guess is workflows. Arches provides some tools in the platform to build your own custom workflows. They're essentially wizards. They carry a user through a series of ordered arches data entry forms, usually a series of cards or related cards and maybe some related UI if there's like other interactions with external APIs maybe that need to happen along the way. This is an example of a workflow that walks a user through a collection of consultation for historic England. And plugins are one aspect of arches that you can lock down by users and roles and groups and arches uses the default Django authentication and admin apps to manage users and related data and workflows. In addition to or on top of the default authentication middleware in Django, we've just recently as Annabel mentioned added support for doing single sign-on via any identity provider that supports OAuth2. So see me authenticating here with Microsoft. This allows, because we used OAuth2 as the standard you can use pretty much any identity provider that supports it, so Microsoft, Google, Okta, some bespoke thing. Since arches data models are created as themselves rows in Postgres, like they're not actually database entities, they're actually managed as records. We needed a way to actually, in order to achieve the kind of permissions that Annabel described earlier where you can control at a pretty granular level who has access to what data and can do what, we needed a way to manage permissions at that object row level. And so we used a Django extension called Django Guardian and added some custom UI that you can see here in the arches designer and that allows you to manage these Django Guardian permissions and actually set row level permissions. And we used Django Guardian to manage a lot of our permissions as a result of the sort of extensibility of arches, widgets, cards, plugins, all of those sort of custom components that I was talking about. We used Django Guardian to manage permissions for those as well. So as has been mentioned, the semantic nature of the arches resource models and the standards-based nature of arches mean we can have this potential for interoperability and the reuse of arches data beyond the life of the system. And arches provides data export tools and APIs in order to facilitate this interoperability and reusability. And so this is an example API that arches provides. This is the JSON-LD API. So it's probably hard to read there, but the data point to URIs that provide the linked open data aspect of this. So all of these values in here, though you may just see URIs, you can get to descriptions of all of these things and that's how you have a self-describing representation of your semantic business data. We support requesting individual records in this JSON-LD format or exporting entire data sets in the JSON-LD format. And that's really meant to facilitate interoperability with linked data systems, but also that capacity for these data to live outside of the system. Like one of these JSON-LD documents, it's self-describing nature. You don't need arches to understand or interpret these data. Arches, the API also provides standard mapping services. I mentioned post-GIS. We use Django to return vector tiles, mapbox vector tile representations of the geospatial data and arches. And that allows for high performance when trying to map large numbers of vectors. And that's why it's important for us to actually index these data in a post-GIS geometry and use geospatial indexing, but ultimately we're able to use post-GIS to cast these to mapbox vector tiles. And if you don't know anything about mapbox vector tiles, that's fine, but the point is that it's a standard way for web mapping and GIS software to interact with geospatial data in a vector format. And so we provide those services in arches. We also provide other kinds of common geospatial formats, like we provide GeoJSON services for arches data. And those services primarily are what allow us to initially integrate with tools like GIS. So this is a quick video, hopefully quick, of how we're actually integrating some arches data into Esri ArcGIS Pro. So we actually built a plugin for ArcGIS Pro that allows you to do some of the data editing. Initially here you can see, we can overlay data from arches using the arches APIs and then a user can actually select data and can edit the data here, edit the geometries using their preferred GIS tools. If you know GIS people, they have their preferred tools and they don't want to use yours. And then it links in with arches, so you can actually manage the arches business data as well directly in the ArcGIS Pro interface. So you'll see here now it's gonna bring up the arches editor. Yes? Yes, yes. He was asking if we exposed the data through Sparkle and yes is I think the answer to that. I'm not the expert on that particular topic, but yes that is made available. Sorry. Yeah, as you can see, it brings up the arches editor interface inside of ArcGIS Pro, so you can kind of edit the data there and finish the job without actually leaving your desktop GIS. So that's a bit about how arches works and I'm gonna pass it back to Annabelle to talk a bit about the open source community and how you might be able to contribute. Thanks Rob. So I think in the particular case of that plugin for ArcGIS Pro, that highlights one of the pros of developing in an open source environment. That was actually a requirement for Historic England's Greater London Historic Environment project and they funded the development of that, of that particular plugin, or that's a plugin, right, for that project, but now that functionality is available for the entire community in arches and actually and as Rob mentioned, everything within arches, not just the core code, but also, and we didn't really go into this, but there is the core code, but then there are also project slash packages that sit on top of that core code that has all the customizations for your resource models. So I mentioned that LA and San Francisco have different resource models, so they have a different schema, but anyone in California that, maybe if you're in Northern California, you wanna use what San Francisco's doing, then you can use their package slash project and as a starting point to customize arches or to configure arches and then if you're in LA or if you're in Southern California, you might wanna use LA's package, but yeah, so that's one of the beauties of the open source nature of arches and one of the reasons why the Getty Conservation Institute and World Monuments Fund 10 years ago decided to go with open source as the way, as the development style. So our open source license is GNU, a Ferro general public license, version three so that AGPL three and again, software improvements are shared throughout our community and it's not limited to a central organization or company. I mentioned that Ferroland Geographics where Rob is based, they're the main developers at this point because we have the Getty Conservation Institute has been the main driver of development up to this point but we're in the process of defining a more structured but open governance structure. So we're in that process so that it's not just dependent on our funding and already we're already seeing funding from different places as I mentioned our internationalization features was funded by the Arcadia Foundation in London and also actually the SSO, the single sign-on was funded by Auckland Council because that was a business requirement and now that's available to the entire arches community. So our code is on GitHub and that's the link and it's also at the end so you don't have to copy it down now. I showed this slide earlier but here it is again because from an open source perspective the more stakeholders and users the stronger and more sustainable the software is and fostering the community is as an important part of the arches project as is the development of the software and actually that's pretty much what I do for the arches project. In addition to training and data and data strategy I also work on developing this community giving presentations like this and because ultimately open source equals a community focus as it is beneficial for community members to collaborate. An example of this is the US arches user group. We have a couple of different user groups already and the arches US user group meets bimonthly to share their experiences with arches and potentially pull resources to fund functionality that benefits each member and the group at large. So we actually just had a meeting yesterday and San Francisco shared how they're using the workflows to actually a mobile version of the workflows to collect data in the field. So that was really interesting. And the open source community it's composed of many different kinds of people and organizations so on the organizational side governments, universities, non-governmental organizations, community groups. There are also domain professionals currently. It's mainly cultural heritage experts but we also are incorporating scientists with that arches for science development but also other types of professionals. Software developers of course, IT service providers to provide services to those who might want to implement arches and knowledge organization specialists. So you can implement arches for your organization so you can regardless of who you are and where you fall in this grouping you can be part of the arches community but regardless the first step would be to check out our website it's www.archesproject.org. There's a lot of information here and so I think the main comment I normally get about the website is that there's too much information but it's probably better than not enough. And you can also join our community forum. It's a discourse based, it's based on the discourse platform it's community.archesproject.org. You can, if you have any questions about implementation but you can post here but or any other general questions about arches feel free to post or you can also just lurk. That's cool too. And that's the end of our presentation. Now there's a lot of things that we could have covered regarding arches but we were trying to actually this is kind of a new type of presentation for us since we don't normally present to purely technical or mainly technical audience so there's a lot of stuff that I might have left out that I would normally cover for a cultural heritage and government audience. So if there are any questions for me or for Rob please feel free to ask. Any questions? Yeah, so I just mentioned the US user group. That's not local. I mean it's US so it's via Zoom but it's bi-monthly and it's yeah and so it's definitely not typing. People do type though but it does spur some conversation in the community but we're open to supporting that, absolutely. We're also, we're trying to put together a developer conference that would be in person probably in the Bay Area for later this year so we're hoping to get together in person but yeah, not currently anything that's regular. Yep. Yeah. We're presenting at another conference next week. Yeah too, I'm presenting in a conference. You're probably not gonna go to Amsterdam but I'm presenting there but I think the next one in California is at the California Preservation Foundation and that's actually an interesting one because that's gonna be less technical but I am presenting with the city of LA and the city of San Francisco on how they're using arches and I mentioned that they're using it slightly differently and so that'll be case study and then you'll also be able to go out into San Francisco and use the workflows in action. Yeah. Yes. Yeah, any map services? Yeah, yeah, so similar to how I described all the different sort of pluggable components, any mapping services that you're using for say base maps or whatever, you can, you know, those are all kind of definable at the project level so if like what you're referring to is like can I show open street map, like base maps from open street map? Certainly, yeah, and like in all the examples we were showing map box base map services but those use sort of open street map plus some map box stuff. If what you're referring to is more actually integrating with the vector data, I'm sure you could. I've not seen anyone like try to like pull their vector data like into an arches, sort of like in one of those custom cards, you could certainly pull open street map data into arches data via like a custom card that integrates with some service, and you could certainly host open street map data in your arches database, right? So you could write that service locally in arches as well. It'd be a bit of customization and it's an interesting idea. Was that sort of what you were referring to more or? It's not a sense. Acknowledged, yeah, yeah. Mostly that comes down to like alignment in terms of like sort of like cleanup and stuff like that. Typically if like precision is a concern, those data are gonna be edited via something like the last slide that I showed where they're actually editing the data in ArcGIS. So then the data that are actually loaded into arches would look quite a bit different than what I showed there, which was just someone plunking a point down on a map. And so the precision was just coming out of, however, Mapbox translates the pixel coordinates of where you put the point, but you can customize the precision of the data that are collected at the data type level. So if you know that you only wanna collect data to like a precision of like five decimal places, you can apply that, but in that example obviously, it was just left to accept exactly what was coming out of Mapbox or the front end. Yeah, sorry. It certainly could be used in other Django projects. We haven't like packaged it up as like an app or middleware. It's an interesting idea. And I can happily show you where the code is in GitHub, and there's definitely an interesting thought there, but it hasn't been built that way, though it's open source code and theoretically, we could repurpose some of that in some way. Sorry, yes. Yeah, do you wanna take a stab at? Yeah, no, so I talk about being able to load different ontologies, and you can create an Uber ontology where you combine in the way that you'd want. So if you have the cultural heritage ontology to describe the cultural heritage business data, you can have that, but then you can also include other types of ontologies in a way that makes sense for your, well, that complies with all the ontology rules, but also in a way that makes sense for your data. So if there are ontologies that you would subscribe to for 3D data or geospatial data, you can apply those as well. Okay, yeah, no, so the city of LA, that's the Office of Historic Resources, and then the user that the city of LA are the entire LA city planning department because they use arches, or they use historic places LA to make sure that when there are okaying permits and demolition permits that the building isn't historic, so things like that. Is that what you're asking? Okay. You know, that's actually interesting because there is a project, not necessarily involved with historic preservation, but that some people would like to do in regards to public art, and obviously that's under the domain of cultural affairs as opposed to historic preservation, and that's something that I would like to get going, but you know, in the city there's kind of silos, and all cities have silos, so yeah, if we could get that going, that would be awesome. Yes? Yeah, yeah. So there's another kind of customizable piece in arches that I didn't get into here that we call functions, and that's precisely meant to serve that, so they're hooks that you can use to respond to those kinds of events and do some sort of thing in response, right? And so, you know, an example of that would be like, you know, you define a geospatial location of a thing, you associate this resource with some administrative area that it falls within, you know, via, and that all have, typically the way that's implemented in arches is via functions, and again, you know, the functions that have been implemented to date run from something very, very specific and local to a set of requirements in, you know, a place like Los Angeles to something that is more broadly applicable, like say, auto generation of some, you know, keys or whatever. Does that answer your question? Yeah, yeah, yeah. You can ask them on our forum at community.archesproject.org. Any other questions? I think we have like four more minutes. Hey, thank you everyone. At the moment, no one has done that, but that's kind of, that has been a use case that has been discussed, but then it's more on the policy side and the getting people to cooperate, that is the question. I know at Yale, they're doing a project right now to kind of include arches with other data sets that they have like in their museums collection and stuff like that. So it's, there's only one arches instance in that, but it's a similar system to what you're describing, but across these other systems, you know, so similar, but not quite that. Thank you. Hello, do I got it right? I believe so. Sure, you cannot walk with a mic. Ready to go? Sure. All right, so thank you all for coming here on the open government track here at scale. Our next speaker is Remi DeCosmaker, who is the open source advocate for CMS.gov, which is for Medicare Medicaid repositories of information. So let's give a warm welcome to Remi. Thank you everyone for being here today. I know there's a lot of really exciting talks you can be at, so I super appreciate you being here in the open government track. It's an important thing to me. Clearly it's an important thing to you. So I'm looking forward to sharing some stories and answering your questions as we get a little further along. But I'm gonna do something a little different today. I hope that you'll, you know, roll with it. This is going to be my visual aid. I'm going to loop it in the background and I'll explain it in a little while. So to get started, my name is Remi DeCosmaker. My pronouns are he, him, and I'm the open source lead at the digital service at the Centers for Medicare and Medicaid Services. And before I get started, of course, being a government employee comes with having a lot of disclaimers. So the first one is the views presented here today are those of the speaker and do not necessarily represent the views of CMS, its components, or any other components of the United States federal government. And the second one, which is a classic, is I'm not a lawyer or an accountant and this does not constitute legal or financial advice. So with all the disclaimers sort of out of the way, let's talk a little bit about what is the digital service? So the digital service is a group of folks who work to transform the US healthcare system by improving the design of healthcare experiences, delivering value to government, providers, patients, modernizing systems and participating in policy development. And we accomplish this by deploying small responsive groups of designers, engineers, product managers, and others within CMS on a tour of duty, where those folks work alongside dedicated civil servants and these multidisciplinary teams incorporate best practices and new approaches to support government modernization and solve some of the most complex problems facing the healthcare system today. And the video that you're watching in the background here is built by this tool called GORS. It's like source, but with a G. And GORS, this GORS visualization is of the CMS open API source code repositories that were created by developers at CMS and are maintained by our office of enterprise data and analytics or OIDA, as we call them. And the software projects themselves, they are displayed as animated trees where the directories appear as branches, the files appear as these leaves of bubbles and developers are seen working on the trees at the time that they were committing. So in the top, it's a little faded in here. If we could turn down the lights, that might be helpful so folks can see a little better. But at the top of the screen, you can see the date of when the developers were doing the work and the red lasers are deletes, the green lasers are additions and the orange lasers are edits. And this is generated from the Git logs from each of the repositories themselves. So we're sort of doing some time traveling and playing back the history of source code repositories going back to about 2016 up until March of this year. And I've color coded each repo so that they are each separate projects or sort of a little easier to keep track of. And another feature is that on the left, you can see that sort of ranked list and it is all of the various file types that are appearing across all of these repositories. So as it ticks up, the project is color coded. So, you know, there's 700 something Java files and a thousand Go files and whichever ones have the most gets to the top of the list and it sort of moves down and dynamically as the visualization gets played back. I'm gonna play this on a loop, so don't worry, you're not gonna miss it. It's gonna start over again and I'm gonna just leave this up in the background and also I will get into some of the technical details of it because we are at a technical conference so if you'll permit me to maybe dive into the configs a little bit and talk about the render pipeline. That's a little bit later though. So the basic process to sort of go over an overview of it is the first thing we did was install some dependencies and they are standalone open source packages. I will go into some of those as we read in the repo in a bit and then you create a repos directory. For this particular visualization I'm only visualizing about five repositories. Then you clone the repos into that directory. You generate logs from the Git history. You then use some fancy regular expressions to insert some parent names into directories and then colorize them and then you can concatenate and sort all of the logs together into one sort of gigantic compiled log and then you run the Gorse tool on it and it generates this cool video and you can do things like add captions and change all these different bells and whistles to make it zoom in or zoom out or go slow or go fast but at the end using FFMpeg you can then export it to a video and this video is live on the CMS YouTube channel. It's definitely something that I'm pretty happy about seeing like weird hacker data visualization stuff showing up on a government website under a Creative Commons license. Definitely a cool thing to release here at scale so that's exciting, happy to work with our social media team to make that happen. So that's the process and we'll dig into the config and the rendering pipelines in a little bit but let's talk about some of the projects that are up here on the screen. So all of the repos that you see right now you can go to developer.cms.gov and this is the place where we have the collection of all of our APIs, our data sets, frameworks, style guides and everything used to develop applications that help people get the services and benefits that rely on the data underneath them and they can all be found on that website and then those where those have repos they link to GitHub and our GitHub is CMS.gov where you can find a lot of these public repositories as well. You can actually find the repo for this visualization at the bottom of the screen. Maybe folks in the back can't read it. GitHub.com slash dcause dash gov slash CMS dash gorse. I'll be going into that in a little while and that as well is released CC0. So open source, public domain, take and run as you will. So the five repos that we're looking at right now. So on your screen we just started over so this is a good one. So the top one is the blue button project and then the bottom one is the beneficiary fire data server or BFD server and BFD server is an internal backend to the Medicare beneficiaries, demographics and enrollment and claims data and it's made available in a fire format. Or FHIR and fire stands for fast healthcare interoperability resource and V1 development began back in 2012 and the current version is up to V4 now and it is an open standard that defines how healthcare information can be exchanged between different computer systems regardless of how it is stored in those systems. So it allows healthcare information including clinical and administrative data to be available securely to those who need to access it and to those that have a right to do get access to it. So the standards development organization that oversees that is health level seven or hl7.org and they take a collaborative approach to developing an upgrading fire. So there's public comment period. It's done sort of RFC style. People can chime in, they can make proposals. It is run very similar to an open source community. I'm still learning all about all of the open standards work that happens in the healthcare field. So FHIR is a restful resource meaning that the server will respond with a representation of the resource most often in like HTML or XML or JSON. And then that resource contains hypermedia links that can be followed to make the state of the system change. So you know, TLDR it allows clinical and administrative data about healthcare records to be encoded into structures and then wrapped in a restful API. And BFD which is down in that bottom left corner powers the claims data going into a lot of these other repositories that are about to start popping up on the screen. So blue buttons at the top and then boom. We just saw the BCDA repo hit the deck here. And BCDA stands for beneficiary claims data API. And this enables accountable care organizations or ACOs to retrieve Medicare part A, B and D claims data for beneficiaries. And then we use this, they use that data that then provide quality metrics and participate in other regulatory regimes that CMS has out there along with other parts of health and human services. And then you just saw a new repo, the yellow one in the middle, that's DPC. And that's data at the point of care. And that's an API that enables making a patient's Medicare claims data available to healthcare providers for treatment services. So that's like doctor to doctor API where if we want healthcare providers to get access to data. And then AB2D, which that was a huge commit to beneficiary fire. But AB2D is, I don't know if it's on the screen yet but once it shows up, it is, it provides prescription drug sponsors with secure Medicare parts A and B claims for enrollees. So it's like specifically provides A and B Medicare claims to drug prescription folks. And then at the beginning, we saw the blue button API was sort of the first repository that was being visualized and blue button delivers Medicare part A, B and D data for over 60 million people with Medicare in the United States. And it is probably our most widely used and longest lived of our open API projects. So to go through a little brief timeline about blue button, it was started in May of 2015 during the first 10 year of, I believe we had our first CTO and our first CIO of the federal government back in those days. It was Anish Chopra and Todd Park. And Todd Park ended up being the CTO of the Department of Health and Human Services, one of the first ones. And they had this idea of unlocking the power of data to make interoperable app development possible. And that vision started back then and is still under development today. In 2018, we saw the public launch of the blue button API. And back then it was, I believe it was PDFs. So you could request, but it would come back in sort of human readable, but not easily machine readable formats. So over the course of the years, as we came along into the future into 2018 and 2020, blue button 2.0 was built so that we could have machine readable formats where people could get their own data about their Medicare claims and approve third party folks to get access to that data. And places like, you know, Apple's App Store and other parts of the ecosystem have adopted the standard and use it to deliver health apps. So, you know, this is definitely something that we can point to and show that open source and open standards are making a big difference. So, how do we enable data exchange? Collaborating with industry, implementing programs, developing policies. There's always a human centered design approach that is at the core of what the digital service does. We work with user experience designers and you can look at like playbook.cio.gov to get an idea of what that human centered designed process looks like where we're talking with users and trying to bring more of a producty, agile approach to a place where in the government, where maybe that hasn't been the standard way to deliver software and do work in the past. So, since 2018, there's a group called CARIN, C-A-R-I-N which is a multi sector group of stakeholders from hospitals, physicians, clinicians and millions of patients and other consumers and caregivers. So, they worked alongside CMS developers to provide the Blue Button 2 API and getting those A, B and D claims and enrollment data for each person one at a time and providing authorization for it was 2018. And then in 2020, CMS published the Interoperability and Patient Access Final Rule which is regulation CMS-9115-F if you wanna look it up on regulations.gov which puts patients first by establishing policies that break down barriers in the nation's healthcare system and enable better patient access to their information, improving operability and unleashing innovation while reducing payers burdens and burdens on providers. So, to give sort of a summary, the final rule established policies in implementing data exchange. So, that's like what these APIs are all about and sort of what they enable more or less you've got the beneficiary fire data server that enables three different lenses into the same data set to be used in sort of three different situations. One is for providers who wanna share data with each other, one is for prescription drug providers who wanna get access to A and B claims and the other is more of an open sort of individual who wants to grant third party access to their claims data and then that can be used in lots of different ways. So, that's the ecosystem of APIs that we're looking at. When I started looking into doing this presentation I thought I was gonna just visualize all of the stuff on CMS.gov and this is already complicated enough to look at and there's only five projects. We have over 220 repos on our entire GitHub and at that point like it's already zoomed out to the point where you can't really read the names but in that one it just looks like just a colony of ants almost and so I had limited it to this but using the tools that are published on the GitHub we can actually edit and make it work for whatever needs you have. So I'm gonna do the thing where stop the video and dive into the repo. So, my resolution is not great. Let's see if I can, that's a little better. So, github.com to cause Gov, CMS-gorse. So we got the link to the YouTube video here so you can just, if you can remember the GitHub repo you can find it, the video as well. And like I said, the way you use it is by installing the depths, creating a directory, cloning repos, generating logs, inserting names, colorizing the logs, concatenating everything, running Gorse and exporting the video. So, I used brew to install Gorse and FFMpeg but they are standard tools that are definitely available in Linux distributions and package managers. I've run the Gorse pipeline on Fedora. I've run it in other places too. It is an excellent tool, highly recommend it. And then the tools that I used for some of the command line and text editing magic were VIM and SED, but any text manipulation tools capable of regexes and find and replace functions will probably do the trick for you. Open source has lots of tools that do these kinds of things and IDEs these days even have that kind of capability. So, you start off by making a directory called repos, you go into it. This could have been one bash script but I did it the hard way so everyone could see what each of the steps are. You clone each of the individual repositories. These are the addresses for the ones that we saw on the screen but you can substitute in whatever repos you would like and then you use the Gorse command to output a custom log and you tell it what log you want it to be and then you tell it what repo you want to be associated with the log and you do that for all the various ones that are in there and then you run this slick little SED command that will then insert the name of the project which is this part of the regex that I highlighted here into the log and that's what makes it so that they all don't live in the same tree and it separates them out by tree so that you can view multiple repositories on one tree because it's meant to sort of be a single repository visualization Gorse. So, after you've added all of the names and the order is important I found out you want to make sure you add the names before you try and colorize it or else it'll do weird stuff. I used Vim and all I did was add a hex code to the very end of each line in the log so you can just use a standard color, any color you want that's available in hex. I looked at our CMS branding guidelines and tried to use our official colors from branding so that it would look nice on our background. I think it worked out pretty well and I love Vim for being able to do this kind of stuff. You can of course use whatever a text that I do like and then you concatenate all of the logs so in this case there were the five repos and then pipe that through sort and then push that out to a giant log and that is the log that you use to generate the Gorse visualization and the pipeline that I use for that is pretty simple. I like to time it so that I know how long it takes because rendering is a thing that you can go and do other stuff while you're waiting for things to complete. You load in a custom config so we'll dive into CMS.config in just a second. Dash F means full screen. Then you pass in the resolution for this. 1080 was the resolution that the conference recommended for videos and presentations and then you pass it to log. It worked great. It took about six minutes to render and display on the screen and then at the end of it I got two gnarly FFMPag pipelines and these took a little bit of playing around with but you basically pipe the exact same command for rendering Gorse but then you do a dash O and you pipe the standard in and you pass it to FFMPag and then you have a bunch of different options. These are all available in the Gorse.io source code repository. It'll show you like various ways you can render out. I have one example for WebM and then I also have an H.264 pipeline as well for doing MP4. So really those are all the steps and how you put it together. Of course as mentioned before really excited to provide this under a Creative Commons license. It's universal public domain all over the world which is a cool way to contribute back to the community and share it with everybody especially when you get to do that in your day job. So a little bit of an acknowledgement thanking the GSA, 18F, Consumer Financial Protection Bureau the Office of Management and Budget and a lot of the other folks who since 2012 or 2015 did a lot of the work for enabling federal government employees like myself to contribute to and release open source software. It is a very different world now than 10 years ago and I think it's gonna be an even more different world 10 years from today. And I'm really excited to be a part of helping to define what that looks like and my role in doing that is to enable folks like you to participate and this is by the people for the people type stuff and the more participants that we have contributing the better we can represent the population of the country and the needs that we have and the less times we need to recreate the wheel and the more times we can capture value and not see it used for duplicate work. There's nothing I hate more than seeing the same problem get solved over and over again with a different budget. So that was the read me. I just wanna show you the config file really quick. So this is, let me can blow that up a little bit. Let's see, yeah. So sorry for the white space hack to like move it to the side to get around the key but this is the title and it's just an INI style config where you just have one option per line and then you tell it equals and you give it the format. So the title, the title offset, the date format. So that's how it just says like the day it could be like the whole time stamp or whatever you want it to show. If you have a project that has like a week long history for a hackathon you can tweak it and run the visualization so that it goes really slow and you can do it almost in real time and play things back. I did like a super time traveling version because we're on like Ms. Frizzle's, you know repo dive into the history. So I try to show everything in about five or seven minutes. But if we played this back in real time that's like six or seven years of development time in six minutes, which is really fun. And the auto skip is what helps with that. So you just say, give it like 0.05 seconds between commits. So you just like speed through all the commits as quick as you can. And I said, yep, we wanna highlight the users so that they're there. Make the file idle time zero. So that's how long before the bubbles disappear. I wanted every file that's been edited to stay on the screen. But like I was saying before when you get into hundreds or thousands of repos sometimes it's a good idea to have them fade out so that you can just see whatever is the most recent and then they'll go away. So you can see and focus on other pieces, but you know, this is why these configs are here. And for production, you wanna hide things like file names, the root of the tree so that each one appears at its own leaf. The bloom effect, which is like this cool like glow that comes in when it makes it look pretty but when there's so many things on the screen it's just really distracting and like a giant shining light that you can't see through. So I had to turn it off and then the mouse so that it could run without seeing the mouse when we render the video. And then when I'm doing my like tweaking you can keep the progress bar or hide the progress bar so that you don't accidentally skip ahead. In this version I have it stopping at the end but you can have it loop. I didn't have to use YouTube to loop through the video. I could have run it locally but I wanted to use two laptops so I could have my speaker notes too. And then multi-sampling is for like anti-aliasing to make things look a little better. No maximum amount of files on the screen. No lag from the time that someone commits a file to the time that you see it show up. Sometimes there's a delay. The output frame rate, just 30 frames per second so that it looks good. I user-scaled it up so you could see the little people because once you zoom out really far you can't see them too good. Full screen, ops. And then seconds per day is like 0.15 so that's again like how we skip through but if you wanna do one second per day or 10 seconds per day you can really stretch out and cut different kinds of configs that visualize different kinds of software projects. I've used this in the past to look at the history of projects that were short-lived over like a weekend. So you still wanna see everything but real time's gonna be too slow. So if you set that seconds per day to be whatever you want and the auto-skip to match it you can really dial in the range of course to serve your purposes. The bloom intensity I had set even though I didn't use it. And then you can pass it a custom background image. So I named my file this long ridiculous thing but you can just pass it a PNG, a JPEG or whatever you want. You can set the file size of what the title and the key looks like. The camera mode has two different sort of options overview where you look at everything on the screen or track. So you can give it like a specific user. In the past when we visualized the YUM project it was back at FUDCON in like oh gosh 2014, 2013 maybe. And the lead maintainer of that project, Seth Vidal, had recently been in a tragic accident. So we developed sort of a memorial visualization that followed Seth's path through the history of the YUM project and we broke down each year to be a separate tree. And we had all the files that Seth touched be blue so that you could see how much of the project was touched by this one particular maintainer. And these are all just configuration options that are all available because of Gorse. Gorse is a really powerful tool. If you look at my YouTube you'll see I got quite a few of these things. I really like using it for this kind of stuff. But the camera mode overview lets you look at everything and track lets you follow specific users like that. Padding is just like how close the tree gets the outside. You don't want it to overlap if you can help it. The key on the side is just a true. So you can see like the key of all the file types. And then there's captions that you can add to it. This is kind of a newer feature that was added somewhat recently. So they might have been a little bit smaller low on the screen for everyone to read in the back. But I tried to give some information about what you were looking at as the timestamps were coming up on the screen. And you can set the size, the duration. You can do the offset of how far it comes horizontally. And then this is commented out at the bottom but you can actually have like avatars instead of the little bubbles of, or the little human bubble looking things there. You can like actually make those a picture of what you want. So lots of configuration options and tweaks that you can make to Gorse. This is the config that I use for this visualization but there are lots of different config options that you can take. And then just really quick in the captions file. So this is what it looks like. These are Unix style timestamps. It was a little bit of a fun dance doing the conversion between dates and timestamps but there's websites and tools for that. And then whatever message you want, pipe delimited. So I was able to say, okay, this is when blue button hits and then I'd pop up a little caption that says this is when blue button was released. So this is how you put captions on the videos. And it's very straightforward. It's all plain text. So again, lots of extensible tools. This is all of the tech that went into making the presentation. This is all available on GitHub right now. It's public. It's Creative Commons public license. So take it, run with it. The Gorse website itself is excellent resource where you can find all of the configuration options and all of the settings that I've talked about today. You can almost follow this word for word the same way that I did to create a visualization for your project or your community just like the one that you saw today. So that's Gorse. It is an amazing tool, special shout out to Andrew Codwell and the Gorse.io project contributors. You can check out their GitHub repo here and Gorse.io is where you can find that. So let me put the shiny thing back on again and finish off and take her home. One second while I bring my speaker notes back up to get the password right. Okay, so throughout this process, there were a few lessons to be learned for sure. I've been fortunate enough to serve in community management and community leadership and community advocate roles in a variety of open source organizations. And typically, there's a lot of contributors and there's a lot of repos. CMS has hundreds of repos, thousands of contributors, tens of thousands of contractors and millions of lines of code. So that's like a couple extra zeros on the amount of data and metrics that I typically would track as someone who works within an open source program or open source program office. And the complexity is astonishing. This, the federal government is the largest employer in the United States, right? So the amount of organizational complexity that goes into mapping the entire ecosystem, like this is what five out of 200 looks like. So one of the big lessons learned immediately was that this is scale. This is at scale, just a staggering amount of development. And we're gonna need a lot of tools and more people to sort of figure out and keep track of all the things that are happening. And each one of these trees that you're looking at, in and of itself, it's its own microcosm of complexity that's solving some hard problem in tech. So it's microscopes and telescopes. You can zoom in with a microscope as deep as you want into each repository and you can zoom out with a telescope as broad as you want in the galaxy of open source that exists in the federal government space and beyond. There's exciting things that are happening at the international level. The World Health Organization has an OSPO now. They have a lead. The United Nations, at their General Assembly is proposing that they create an open source program office. There are other parts of the government inside at the state level in the United States that are creating digital service programs like the one that I'm a part of. So, you know, open source is becoming more and more just how we do things. And the idea that we need metrics and that we need experts and we need maintainers is becoming more and more clear as we see more threats and we see more leaks and we see more cybersecurity incidents and we see more state actors and we see more of the world being eaten by software as they say around nearby these parts. You know, folks like you at this conference are some of the best people that can help to address those challenges. And if we're gonna address some of the biggest and most complex problems facing healthcare, we're gonna need all the help we can get. So, part of my job isn't just making shiny visualizations like this. We're actually building teams. And there are three programs that I wanna mention specifically today. One of them is a program called the Digital Core, which is C-O-R-P-S. So, digitalcore.gsa.gov. It is very similar to usds.gov, which is the four-year tour of duty model where we bring technologists in to help modernize the government. But Digital Core is meant for early career technologists who are just leaving their undergraduate and graduate programs. And it's a two-year tour of duty, so it's a little shorter. They give a 50% signing bonus so that that can help incentivize people to make the switch from some of the other opportunities that are out there and be a part in the public sector. And then at the end of it, there's a full-time job waiting for you. And this, I thought, was a good idea when I started last year. But 2022, we saw, I think, over 100,000 layoffs in the tech sector, according to layoffs.fyi. So, and the people who are affected most by that aren't necessarily the senior folks who've been around for 10 years. It's the people who are just getting out of school. And it's the people who are looking for jobs and looking for internships. So, Digital Core, I believe, is a really great opportunity for people to get some early career experience doing real-world problem-solving. There are over 150 million people that depend on the code that CMS hosts. And that is a great way to learn some experience at scale. And at the end of your two years, if you don't want to stick around, that's fine. You did your public service and you can go on into the private sector with a perspective of how things work at that scale and bring those skills to the next role that you have. So, Digital Core, there's a pretty short window. It's like two or three weeks that it opens up. This year, I think it was open in late December, early January, but if that sounds like an opportunity that you or someone you know would be interested in, keep an eye on that website and follow the Twitter account. Digitalcore.gsa.gov. So, that's the two-year version. The four-year version is usds.gov. So, I'm at a permanent digital service inside of CMS where we are CMS employees. But half of my team is detailed in from the Office of Management and Budget at the White House. So, they get detailed into different agencies across the government to help solve gnarly problems and do novel research and bring best practices. So, usds is another place where if you're later in your career and you're established, you can do a four-year tour of duty like I'm doing now to get involved in bringing your skills to public sector. And then, earlier in the pipeline, there's a program called Coding It Forward. And codingitforward.com is the URL. Chris Quang, who was the founder of that program, was actually tapped by the Biden administration to create the Digitalcore at GSA. So, the Coding It Forward program is still operating and it does placements of 10-week paid internships for undergraduate students or bootcamp students who are currently enrolled and they get placed at federal, state, and local level governments. So, you don't just have to work for the federal government if you're looking for an early career opportunity. And I just went through that interviewing process and just completed in-sent offers and we had three students accept. And, you know, before they start, I'm not gonna blow up their spot or anything like that, but it's some really amazing pipeline of talent that that group has built. And we're super excited for those folks to start. Upstream contributors, people who've worked in big tech companies, people doing major open source and open data pipelines in projects like Hadoop, Crates and Rust, Python, Java, lots of different parts of the ecosystem of open source that a lot of them are being talked about today. So, the idea is that we're building a pipeline where if we have an early career pipeline that starts in the undergrad, coding it forward, those students can apply to become digital core fellows. Those digital core fellows can join the US Digital Service if they wanna continue a tour of duty or if they wanna continue to be a permanent employee, there's a place for them to land at the end of it. And really, underneath all of it is the ambient open source community itself. So, our GitHub repositories and the projects that are being worked on in the open can serve as like a lobby where we can start to gather the folks who are interested in this work and prove that they know how to do it and that we know how to solve these problems and address them and that when we're doing these hiring actions that is a ready pool of people. And because the opportunities are limited, it's sad. I wish we could take everyone who applied, thousands of people apply to these programs and only hundreds or less can get into them right now because we're still growing and building them. This is the first year that digital core was started. So, I wanna make sure that those people who weren't able to make it this year have the opportunity to gain experience and work on contributing to real world problems and help to solve some of these challenges that we're dealing with. So, this pipeline approach, I believe, has a lot of potential and part of why I'm here is to find out how best we can engage with the open source community, what projects are available, what are the needs that need to be met? Who are the people that want to help and answer the call? So, thank you, Scale, for being here at this talk today. And I'm happy to take questions. Any questions? I'm involved in a group that's providing IT skills for underserved teenagers and things. We try to connect them to internships and things like that. Now I know that being the federal government, your flexibility is not the world's greatest, but is there any way to connect to you? Yes. Like I said, this is a pipeline, right? So, if you're working with early career high school students, happy to connect you with the folks that are coding it forward, happy to connect you with the folks at Digital Core so that when those opportunities to apply and they're looking for their next step, we can make sure that they are informed when the window opens so that they can get a chance to apply too. So, let's definitely connect after the talk. I'd love to share information. Thank you. So, my question is around security. So, in a world where connected personal data is kind of becoming a modern currency, do you have published guidelines to assist vendors that are consuming this data? Or do you, if not, do you have penalties for when that data is compromised? The short answer is yes. The long answer is there are other people who specialize in those particular areas who can give you a better answer than I can. One thing that you can look at that just came out this week was the White House's National Cyber Security Strategy. And there's a lot of guidance in there around software supply chain assurance, software bill of materials, some of the challenges that we see in securing open source. I personally believe that no matter how perfect the tools we get, if we have a perfect list of every CVE and a perfect list of every open port and a perfect list of every out-of-date dependency, there are still not enough maintainers to solve all of those problems. And even if Google and Apple and Microsoft and every other large tech company on the planet put together a biggest pot of money that they could to just throw at this problem, it's still not going to solve it because a lot of times the way that we are trying to dig our way out of a software hole is to dig another software hole. And we're seeing software sprawl. So, we're actually creating a larger surface of projects to maintain. And that's further fragmenting and creating proliferation issues. So, for me personally, when we have public infrastructure, digital public goods is a concept that is being talked a lot at the UN level and at the international level now so that we can start to unite around common infrastructure and stop fragmenting so that we can maintain one stack or a smaller number of stacks so that we can start to unite these hackers and maintainers. We need programs to train them. We need to get the lottery factor up above one. Thousands of single maintainer projects are crucial to the structure of our government, of our internet and everything else. Let's get that number up to two to start. And these programs I just talked about hopefully can help to build that pipeline. Great question. Hi, sir. I'm local in the Los Angeles area and I was wondering, is there any local groups that meet like bi-monthly, monthly or quarterly? Thank you. Yeah, so I know LA and San Francisco are far away from each other. They are, but as someone who lives in upstate New York near Canada, when people say, oh, you just go to New York City all the time, I'm like, no, it's like about as far away between where I am in New York and where you are from SF. But Code for America has their headquarters and a lot of the work that they focus on now is actually on California state level stuff. So CalFresh food programs and other types of opportunities to work at the state level. I believe California has a digital service as well. I know that the city of San Francisco has a digital service, but you might be surprised if you start looking around for digital service, name of locality. Talking to the Code for America folks, they might be able to help you get plugged in. And there's a nonprofit organization called the US Digital Response where they are working with volunteers to help them get placed into these places where there may not be budgets at the local or county level. And there are technologists who want to get involved. And then those can lead to contracts and jobs, et cetera. But there are a lot of civic hacking orgs out there that have come and gone, but Code for America is still here. Code for America Summit is coming up in May. So that's in DC this year, I believe, but I believe they're also gonna stream some of those talks and they have a ton of mailing lists and resources. There used to be things called brigades, which were like local groups, user groups where people could hang out and talk to each other like you're talking. So I'm not from this side of the country, but even I know of California being a hotbed for civic hacking. So Google around, and if you find those resources, share them because we need more people like you. Thank you. Any other questions? I'm probably getting close to the end here. Let's make this the last question, and then if you have other questions, you can talk after the talk. You mentioned a number of other groups that you work that are going into, or governmental groups that are going into this open source mentality. How do you share between them? What does that look like? Yeah, it starts with just publishing where we can. I am tracking, along with the Chaos Project, C-H-A-O-S-S dot community, they have a project called Augur, which is scraping GitHub repos and it's a tool for getting metrics, and I use that to track the CMS repos, but I'm also tracking any other dot gov agency that's repos that I've started to use, and those tools are powerful at what they do, but the community around it is still just nascent and emerging, so GitHub used to have a government community that was led by Ben Balter and some other folks. That's still a place where you can go to see across a lot of these organizations. The Software Heritage Foundation is more of a EU-based organization, but they are providing one place where all the source code is treated as a cultural heritage, just the same way that landmarks and museums and historical sites are treated as cultural heritage sites at the UN level UNICEF, so source code is just starting to become recognized as part of our humanity's cultural heritage, so check out Software Heritage, check out the UN and UNICEF programs, the Digital Public Goods programs. I'm happy to try and brainstorm a few others, but there's awesome lists all over if we look for them, but we do need to bring as many people together as we can. Once again, ready to cause maker, thank you very much. I have an hour break in this room, and then we're gonna be back at one o'clock with Bronwyn going over some, talk about the LARC scene and how they're using open source software there. Test, test, test, test, test. Check, check, check, check. Check, check, check, check. That seems to be effective working. Is it actually working? Ah, a new trick, ha, put it on right side up, got it. We're gonna turn the lights off if y'all don't mind because it's, they're humming at us. Everybody good with that? Mood lighting for the arts. So I am gonna go ahead and get started. I find it a little bit strange to be standing up here at this great distance from all of y'all, but thank you for being here. I'm Bronwyn Maldon. I am director of research and evaluation at the Los Angeles County Department of Arts and Culture. Does anyone know what the Los Angeles County Department of Arts and Culture is or does? Anyone in this room? Great. So we are the local arts agency serving the 10 million people who live in the 4,000 odd square miles that make up Los Angeles County. There are about 4,500 local arts agencies across the country. Every county, most major cities have them. There are local arts agencies for the City of Los Angeles, City of West Hollywood, Santa Monica, Santa Clarita, Glendale, Torrance, all sorts of cities within LA County. We serve all of Los Angeles County. I did think when, as I was prepping for the talk today, that the title that I submitted for this session, why the subtitle there, why an open data approach matters, I realized this was the wrong subtitle for this audience. I don't think I need to convince y'all why open data matters. So I have a second title for my presentation that I developed, which is the search for the lowest common denominator. And that is the journey that I'm gonna describe for y'all that I went on on this project. But we are gonna get back to why open data matters specifically for the arts. A little bit of context, working in arts administration. I just so you know, my universe that I work in, a really big data set for us might have like 700 rows. Huge. On the columns, if we're in Excel, we might get to like a really, really huge data set. We might get to like two letters starting with a B. So I feel like some of y'all may work in a different universe than the one I do. But I do know a little bit about the arts and arts policy. So the journey that I'm gonna describe to you today begins actually in 2015 with the creation of what's called the Cultural Equity and Inclusion Initiative. And that was that as a directive of the Los Angeles County Board of Supervisors. LA County Board of Supervisors. Does anybody know how many elected officials are on the LA County Board of Supervisors? Five, that is correct. Five, 10 million people represented by five people divided into districts. Each district contains about a couple million people. So 2015, we launched the Cultural Equity and Inclusion Initiative that was designed to ensure that everyone in LA County has equitable access to arts and culture. And fast forward from 2015, a lot of process, a lot of meetings, a lot of public input. And the Board of Supervisors adopted a cultural policy for the County of Los Angeles. Now cultural planning happens a lot in the US. Cultural policies are a little less common. It's more, you might find cultural policies more in Europe or Latin America. But so Los Angeles County has a cultural policy and that was adopted in 2020. And we think we might be the only county in the United States that has a cultural policy. One of the things that our cultural policy, this is just a little excerpt from it because it has two really interesting pieces to our cultural policy for Los Angeles County. And that is that the cultural policy, our Board of Supervisors directed all county departments to do stuff, words, words, words to ensure that everyone who, all residents of the county can participate fully and equitably in cultural life through the arts. So my department, the Department of Arts and Culture, we are the second smallest department of county government. They did not direct us to do this work. We're included in all county departments, but it wasn't us. It also wasn't this family of institutions that we lovingly refer to as the county cultural. These are the cultural agencies that provide cultural services to LA County that have some kind of relationship with county government. They may get funding, they may have facilities provided to them at no cost. Some of these may be familiar to you, some of them you may never have heard of, but these are kind of the county cultureals. So the policy did not simply apply to our department, not simply to the county cultureals, but to all departments of county government. And this is not a complete list of departments, offices, agencies of county government. So the idea of this policy is that everybody, we're all gonna be engaged in doing stuff to ensure that all 10 million residents of LA County have equitable access and can participate in cultural life through the arts. And I think a lot of what they were thinking about in particular was about ensuring that these departments are using arts to serve their mission. It's not to say that the animal care and control isn't going to do what they do, they're going to continue to work at their mission, but what is the role of arts in supporting animal care and control achieving their mission? So this gets passed in 2020. And the next thing, the next obvious question is, well, where are all of those departments investing in arts and culture? How are they using arts and culture to meet their mission? So we were given the task of doing a needs assessment. I think it's kind of an unfortunate title for what we did. I don't think it really meets the definition of what I would call a needs assessment, but it did ask the question, and this is a pretty straightforward question that we were set out to answer. How are all county departments investing in the arts? Our Board of Supervisors, they wanted to know how much are they spending? Where are they spending that money? And they wanted us to do an equity analysis. Who's getting more? Who's getting less? Where are the gaps? This is like super straightforward, right? This makes perfect sense. If we're gonna make sure all departments are doing it, as a baseline, let's start by finding out what are they already doing? And then once we have all this data, then we can use that to make future decisions about how future investments are made by our department and by the Board of Supervisors and which departments might we look at or which parts of the county or which communities within the county might we look at to address gaps? Super straightforward question, right? So I start my thinking on this project with what's the data that I have, that I manage, that I have access to, that I have some kind of power over? And that is data that my department collects, manages, maintains. Some of this is available on the county's Open Data Portal. Some of it is not. But let me run through these really quickly so you get a sense of these are the kinds of services that we are providing. So we start with our civic art division. Civic art, that is public art. That is every time that the county builds a new building or does any kind of major renovations. We have a policy in the county that 1% of the cost of that building or renovation is set aside for art. You'll often see, this is where you see those murals, beautiful murals on the side of a building, statues being built. Sometimes it is, it's not what we call hard art on a building, but there might be programming. That money might be invested in programming. So civic art, that data's fairly easy. We have a thing, it's being built, it has an address and we have a dollar amount associated with what it costs to make that thing happen at that place. Okay, that's easy. We also have a number of grant programs where we are making grants to close to 500 nonprofits across the county. That money comes from the county general fund. And these grants to nonprofits, so there's something kind of tricky about these. These, many of these are actually very small organizations. They might have an annual budget of $100,000 or less. They might have one or two part-time paid staff. They might have zero paid staff members. Their office that is in there, that is their kind of official office in the IRS records might be someone's home. They might be providing services all over anywhere so their address of their organization is not necessarily a service location. We've experimented and it's taken us a few iterations to figure out what is the best way to collect data about where services are being provided. Turns out what we can ask for and we think we're getting pretty accurate data is at the end of a grant period, we say can you just give us a list of all the zip codes where you provided services. So we got zip codes for that. That's our geography. We have an internship program. Our department runs the nation's largest paid summer internship program in the arts. It's no longer summer, it's now year round. Actually, thanks to the pandemic, we've been able to extend that. It makes it, it has improved the program significantly. So we have college undergraduates who are getting internships with arts nonprofits. So again, these are structured as grants. At least with these folks, we know where the young people are working so that we have an address for them. Awesome. School based arts education, we provide grants to school districts. We also provide services at no cost to them. We help school districts to write or rewrite their strategic plan for the arts, making sure that arts education is happening in public schools. So our geography for that is school districts. And if you know school districts, that does not tidily overlap with zip codes. Community based arts education. We have programs, this is primarily programs that are taking place in public parks. So we know the address of the parks. We've got addresses there. Then we've got a whole bunch of other random programs and services we're doing where there is no actual geography. We do know the cost of the program. For example, we have a creative strategist program where we place artists in residence in other county departments to help them achieve their mission. So like in 2020, we had an artist, we placed an artist in residence in the registrar recorder's office working on getting out the vote. So we have programs, we're placing them across county government. So we've got funding source. One of the things our authorizers wanted us to make sure we did is we wanna know what's county money, what's not county money, and separate those things out. We wanna find out where is the county spending money. So we've got funding source, we've got funding type, and then we have this tricky bit of geography. They really want to know geographically where is funding being invested. So you already see right now in our search for a common denominator on geography at least, we are stuck with zip codes. And I don't think I have to say much to all of y'all about both the pros and the cons of zip codes. Especially LA counties, we have zip codes that are quite small up in the north where things are really very sparsely populated. We have some ginormous zip codes. We also conveniently have zip codes that extend beyond the county border. But we don't provide services beyond the county border. Those are the downsides, but some of the upsides are that the census and other organizations provide lots of information, lots of data we can pull in about those zip codes and we can pull in for our equity analysis, which I'll show you a little about later. So zip codes, there's our first lowest common denominator. Now, this is my very sad attempt to depict my expectations. This is what I kind of hoped we'd be able to get by going out to all 50 odd agencies, departments of county government. I'm thinking when I go, we're gonna, let me step back. Couple of years ago, we tried to do it, we did an exercise where we tried to get some data from our sibling department. These are our siblings. We are all government. We are holding hands, singing kumbaya with our government siblings here. And a couple of years ago, we tried to, we did an exercise where we were trying to get data from our county siblings. And what we did the obvious thing was we created a survey and we sent it out to all the department heads and we sat back and waited for the data to roll in. And guess what? It did not. In the end, we ended up hiring consultants to follow up with emails and emails and phone calls. And eventually ended up collecting most of the data we were able to get through phone calls and actually taking the survey and administering it verbally. So I knew we were gonna have some challenges collecting all this data for our needs assessment project. So we hired a consulting firm on the front end because we knew that a form wasn't gonna do it. We hired a consulting firm, AECOM, don't know if y'all know them, they're kind of big. Hired AECOM to help us design the instrument that we'd use to collect the data, to actually go out and do all the interviews, clean all the data once they got it and create a beautiful visualization website that you'll see a quick preview of later. So we set them off on their task and based on my experience, the structure of my data, I thought well surely my fellow government officials might be able to tell us how many dollars do we have invested in arts? Or at least they'd be able to identify well we run this program, it incorporates the arts and they could tell us the dollar amount for that program. I also knew that might be a little difficult for them to separate out what comes from the County General Fund, what maybe comes from other county sources versus what's external. Like you can imagine the Department of Public Health might get a federal grant to use arts to work on promoting vaccination or something like that. So we wanted to separate out external funded stuff from internal county. Our Board of Supervisors wanted to know where their money was going. I knew that geography was gonna be a challenge. There's a number of county programs that they may not have geography. Like it might just be available to everyone in the county. So I knew that we would have some limitations there. And I also knew based on our previous experience that there were gonna be place times when all they could do was describe to us in words. Here's what we're doing. Can't tell you how much, can't tell you where, can't tell you who's getting it, but we've got this beautiful description. So this is my expectation. And this is more like what we actually got. Sometimes we had geography. Usually they were able to tell us the source of their funding. Sometimes they were able to tell us how much they were spending on the program. There were a number of times where they're like, well, we have this program, it's really great. It's kind of a mix. There's some arts, there's some crafts, there's a little bit of sports, maybe there's some games. And we would ask them, can you tell us for this program, is it half arts and half not arts? Is it a quarter arts? We couldn't possibly tell you. It's just a mix. And then we come on with year, the time. This turned out, we knew this was gonna be an issue because we're collecting all of this data during the pandemic. All 500 of our grantees had stopped, pretty much all of them were not doing in-person programming, right? Everything moved online. Does anybody remember this whole pandemic thing? Everything moved online, right? It's so hard to remember these days. So we even knew from our own data that we were gonna have to go back in order to get geography of where all our grantees were serving. We were actually gonna have to go back to pre-pandemic. So we did the most recent grant cycle for each of those grant programs pre-pandemic. So we ran into the same thing with our colleagues. Some of them could tell us, well, this happened in 2020, or maybe this would be 2021 fiscal year. We've got fiscal years, we've got calendar years. So in the end, we basically constructed kind of a sample year, an example year, and it's made up of a whole lot of different years, but for each program that's reported, it's the most recent year of data that's available. So there's a range of stuff going on. We got a lot of description of programs. That was exciting. The random PDFs. Some of our folks were able to report data to us in spreadsheets. Some told our consultant stuff over the phone, some sent emails, and then there were the random PDFs that we got that somebody had to pull data off of. The descriptive text piece of this was kind of interesting because there were a number of times when in the interviews, our consultants came back to us and said, oh, we just had this great conversation with this department, and they have this artist that they're working with and they're doing this, this, this, this, this, and we would say, yeah, that's our creative strategist program. We run that, thanks. I think the importance of the textual stuff is that we did get a lot of really interesting information. There is stuff going on, but stuff that couldn't be quantified, couldn't be measured, we couldn't assign dollars or geography to it. But there also was a lot of enthusiasm across the county for using arts to achieve other goals, for get out the vote efforts, education on health issues, on mental health issues, and things like that. So I'm happy to say that our consultants had to deal with that and not me personally. But let me get to this really critical thing. We couldn't get dollars for everything. So all of a sudden, we find ourselves having to find a new common denominator. We've got zip codes for geography, but instead of dollars per geography, we ended up having to go a few steps lower, it felt like, to a count of programs per geography. For those folks who could tell us, yes, we have a program, it takes place here, I don't know how much it costs. They could, we could start to count the number of programs taking place in each zip code. Not perfect, not what we hoped for, but it does give us a sense of where the county is investing in the arts, even if we can't actually quantify the dollars that are being spent. So this is the slide where I pause for a moment and acknowledge something that I don't know how much we all acknowledge in the role of politics in data collection. I work for government. So there was like government politics. There was like, there are entities of government and there are policies that are driving decisions that people make. But there's also that other layer of politics where it is people trying to protect power, protect access to resources, protect their programs. There were, this is just all I can say on something that's being recorded is that there was politics involved. And sometimes I understood people's motivations and sometimes I just didn't, all I know is that how it affected my ability to collect the data that I needed in order to respond to our elected officials who directed this. But I think this is something that I think and all of the work that we do, we have to acknowledge there is always politics at play. And it's not, it's, in some ways, it's not a good or bad thing. It's just a thing and we have to figure out that it affects the type of data we're able to collect. It affects whether we're able to achieve our goal, like those idealistic goals we have of what we're gonna measure. And we also have to recognize the impact that it has then on the other end of what we're able to report out and the accuracy of it. So that's my politics slide and I'm done with that. So a couple of screenshots here of our final product, the needs assessment website. As you can see, it's built on Esri. LA County is heavily, heavily, heavily invested in Esri. I have been told recently that we are customer number three. We use Esri for a lot of things. In fact, our county's open data portal that launched, I wanna say around 2015 or so, was originally on Socrata, which became Tyler. And we are just right now in the midst of transitioning to Esri as the host for our open data portal. So that's an interesting process. Interesting process. So this is built on Esri. This is kind of, it's in a story maps format. If you're familiar with story maps, how you can scroll through them. Couple of things, we know our audience really well and they are not data people. They are arts people. We want our arts community to be able to use this. They are also not the people who are gonna start clicking. Oh, there's layers, let me click on turn on and off layers. These are things that are not going to come naturally to them, this is just not the stuff they're used to. So as we've written the text, we've made, tried to make it really clear what the key findings are in the text, but we've also interspersed it with instructions on how to use the tool. And rather than dumping all the instructions in a click here to figure out how to use this tool, we've kind of built it in, woven it in through the text. What you're seeing right now, this is, in fact, since it's my data and I don't know the dollars, we do, we did report dollars for our department. So we do have dollars by geography. This is our organizational grants program. It is one, it's kind of our flagship grant program. More than 400 grantees per year, arts nonprofits, they're doing programming all across the county. And let me tell you, this was a surprise to some folks. They had no idea how many zip codes we were reaching in this one grant program. There were folks who were like, did not believe that we reached this many zip codes. This was a surprise. Not a surprise to the staff who actually run the grant program. So now we've got it, but this is, I think you could see this is divided by quintiles. You've got, you know, those blue areas that are the lowest quintiles and that hotspot, you've got that hotspot, that that's the downtown core of city of Los Angeles. You've got another hotspot, city of Long Beach. So that's where more dollars are being spent. So you can already start to see equity issues in terms of where money's going. So at least for our data, we were able to report dollars. Now, when it came to reporting both other county departments and also aggregating up to everything because we did turn our dollars into count of programs as well, now we're at count of programs. The same thing you've got, this is quintiles with the blue being least and the more red, orange, yellow colors being the largest number of programs. What you're seeing right now, this is county libraries. This is their investments in the arts. So this is their programming right there. So all of the other county departments you can turn on layers for that. We threw in a few other bits of information. We thought that both the public, our stakeholders in the arts community and also our authorizers ought to know. Our department as a local arts agency, and this is from 2019, we had to get pre-COVID numbers in order to be comparable to get, it's complicated, but in order to have comparable data to compare ourselves to other cities, other local arts agencies across the country. So we're getting close to $18 million a year for arts. That sounds like a lot of money, right? $18 million for the arts divided by 10 million people, look at us. LA County is under-invested in the arts clearly. And then this kind of fun task to look at those county culture roles. Remember those county cultures, these agencies like LACMA, like the Music Center. They get the lion's share of county investments in the arts. $33 million a year to LACMA alone. That's more than double what my department gets. But what was also important was to show that the money they're getting from county government, for most of these, it's just a small percentage of what it actually costs to run those institutions. So while LACMA's getting $33 million a year, they are raising four times that amount from other sources. So it was really important to be able, we needed folks wanted and needed to know how much is going to them, but they also needed to put that in the context of this is a small percentage of their total budget. We went to public sources for this data. This data is actually publicly available. You could look it up if you want. The number of the county investment that comes from the county budget book, it's published every year. You can look it up yourself. The right-hand side, that's their IRS 990 form for the same year. Now, is their fiscal year the same as the county's fiscal year, budget year, it's not a perfect one for one match and they made sure we understood that. But in order to have an authoritative source that allows us to compare and see each of the county cultural to each other, those were good solid sources. So that was just like, here's the data, here's where the investments are, here's how much count of dollars per geography, count of programs per geography. Now we get to our equity analysis and our consultants built 21 separate dashboards for us that look at 12 different dimensions of equity. And you can see these are kind of pretty standard things that we would want to look at that we think that our county authorizers would want to know. And just to show you an example of what it looks like on the website. So there are 21 of these, looking at each of these different factors separately. So what you've got here is that's the map of LA County and what you're seeing in green and blue is actually whatever dimension of equity you're looking at for that particular dashboard. So this is the percent of people speaking English at home. The figure on the left is the average of all zip codes visible. So about half the average across LA County is 50% of people speaking a language other than English at home. What I really like on the bottom is these column charts. That is every zip code in LA County and the figure for that zip code. So you start to see the difference between zip codes and the dramatic difference. That's really interesting. So the data is also done in quintiles where the darker blue is a larger percent and the paler green is a lower percent. And then when you zoom in, that's when things really start to get interesting because so you can now see that number on the left updates to reflect only the zip codes visible. The column chart also updates to reflect only the zip codes visible. And what's outlined in red is our top 10 zip codes by count of programs. These are the most invested zip codes in LA County by count of programs. And so you can look at this, you can look at by language, by race and ethnicity, by internet access and on through all of those, all of those dashboards, 21 of them side by side. A lot of the data for these dashboards what came from obviously census, right? Census has a lot of this data available at the zip code level. Another source of data and I flag this for you just in case this is something you're interested in. This is a resource that's out there. The County's anti-racism diversity and inclusion initiative, ARTI, has developed a COVID impact equity explorer. They constructed an index of vulnerability and vulnerability and what is the other word there? Recovery, thank you. And did by quintiles kind of highest need to lowest need. And so you can see that. So all of the data behind this map is available but this is at the census track level but we were able to pull some of this material into our equity dashboards as well. So that's a resources out there. It's kind of fun to play with. Also built on Esri as you can see. A couple of critical findings that are worth noting. The County reaches every zip code with some kind of arts and culture services somewhere through some department. I can't tell you how exciting this was for those of us working in the arts, those of us working. We have an infrastructure that already reaches every county, sorry, every zip code. Doesn't reach them all equally though. That's something else. Largest share of dollars, this was to nobody's surprise, largest share of dollars go to those big county culturals but the big surprise when we did count of programs, the largest number of programs, arts and culture programs are actually at our county libraries. It's not our department even, county libraries. There's something interesting about this too. What do the county cultural institutions and county libraries have in common? They are both fixed assets. This is programming that takes place at fixed locations. Our grant making to nonprofits goes to organizations that are spread throughout the county, that travel throughout the county, that it's a very different model and, but if you take it all combined, this is really, it is a picture of the existing infrastructure that the county has for investing in the arts. We can look at the equity gaps, they're very clear but we can also look at this as like this is the infrastructure on which we have to build for the future as we start to address equity. As the Board of Supervisors looks at the arts as part of the solution to bigger and other social and civic problems, we have a robust infrastructure across county government, not just with us, not just with the county culturals. And then there was the fun fact that most arts and culture programming that we found in other county departments, either our department or one of the county culturals has some existing relationship to. There's not a ton that's going on without some participation of an arts agency. We started to do a little bit of data analysis on kind of those top 10 zip codes, the most served zip codes and comparing them to the county and what we're seeing, this is not what folks are expecting to find. They're like, oh, all of the investments in the arts are going to wealthier, whiter communities, right? Well, wrong, we've got some evidence here that those investments are actually going to some of the highest need communities in the county. But hot tip, that's because of so much programming going through libraries and where libraries are located. It really speaks to the importance and value of our county libraries as a service provider, but also as facilities that can offer services. This is just where I kind of tell you that 45% of the departments said they had no relevant data that we got a bunch of PDF flyers and most departments were unable to report service geography or dollars. But I've already told you all that. So let's go back to the original question that I asked is like, why does open data matter in the arts? And what I'll tell you is that there have been conversations going on since before the pandemic, but they've really, really kind of ratcheted up during the pandemic is about the role of arts in addressing social and civic problems. No one believes the arts is going to solve a pandemic. The arts isn't going to solve the housing problem, but there is a belief and I would say a growing belief that the arts does have a role to play in solving those problems. And our authorizers are looking to us to play a role in that, to support the mission of departments that are working on other things. There's this term second responders that gets thrown around in my world a lot about, you've got the first responders who are like out there providing medical care, fire, you know, all of those sorts of things, but second responders who go in and work on rebuilding community. And I think I will not say we're in a post, we can argue, but I won't say we're in a post pandemic period, but I do think we are in a rebuilding period. We're in a period where we need to start figuring out how do we come together and how do we help other people come together across difference, and that is something that the arts really can do. That it really can bring people together. I mean, the arts, what do they do? We bring joy, we bring pleasure, we bring happiness. Sometimes we try to startle you, we try to help you question your perceived wisdom. But when a song comes on the radio and you start tapping your fingers on the steering wheel, or you start singing along, or maybe you're like singing at the top of your lungs in the shower, or whatever it is, like those things actually matter. And I think they matter more today than perhaps they have in my lifetime because we have been torn apart from each other by what we've been through in the last couple of years. And as much as it may be hard to remember a little bit or maybe we don't want to remember, that really did a number on us. I think that what's happening to us, to our democracy is part and parcel of that. There's research that shows that post-pandemic, kind of misbehavior and tearing apart of societies is a normal after effect in the wake of pandemic. So here we are, like here we are standing here, like we've got something to offer in the arts. We believe we can help bring people together. We can help people have hard conversations. We can help people find common ground in an artwork, in a performance, in a poem. Those things really matter. So open data matters because this tells us where we're doing this, like we've got some good indicators. The data's not what I imagined it would be. What I'd hope, it's definitely not what our authorizers were hoping for, but we have some really good indicators of where we're investing well, where we still need to invest for the future, and we can show the infrastructure. We have the mechanisms for doing those investments. And that's really exciting. And now I get to the really super exciting. If you are here in Southern California, I'm gonna invite you to an event. We used to do a data thought a couple of years ago at this very conference. I talked about our datathon. This was an annual event before the pandemic. It's coming back. The arts datathon, where we bring together our arts community to explore how can we use data to improve access to the arts for everyone. And this year's datathon, we're back in person for the first time since the pan, and we are going to specifically focus on this dataset. So we've got the website, the story map that everybody can look at. We've published the data behind it. You can download it and do whatever you want with it. We wanna bring the community together and interrogate that dataset to see what we can learn from it, to ask questions, to raise concerns. Like what's missing from our dataset? What didn't we get? How could we improve on it? And this will be our opportunity to do it. So you are all invited. It's free. It's open to the public. We're gonna be at Plaza de la Raza here in Lincoln Park, beautiful arts facility. And so we're gonna explore this dataset and see how it can help us, having that dataset, how that can help us move forward and then how we can improve the dataset itself. All right, there's all the contact info for me, for the LA County Arts data.org is the website. The datathon website hasn't been updated yet. Hopefully that's gonna happen next week. There's how to find me. And at this point, I'll answer any questions anybody has or entertain any comments that anybody wants to make. All right, yes sir. We had some fun challenges with Esri on this that I won't go into detail with, but we keep our eyes on the prize, right? We focus on where we're trying to get and where Esri doesn't give us what we want. We figure the workarounds, we have our contractors do it, and then a little bit, the county's chief GIS officer has stepped in a little bit to help us. I don't actually have great solutions, but come to our datathon and help us beat up on this. So at the controller, we mainly use Mapbox, and that's actually been quite good for us. And so we've been migrating, but the problem with Mapbox is that it's not as good with kind of that data analysis side of it where you kind of have charts, but the trade-off is that where you get, you basically are thrown in the deep end, you have to program every single box yourself, but it leads to greater mobile usability, and so like, and another trick that we have is that instead of throwing out a large instructions, like you know, most departments we actually see, when you open up their ArcGIS page, you get like three pages of instructions. So basically we have zero instructions, and we throw people into the deep end, and like, them actually figuring it out is actually a better process. If you design it correctly is actually, that's our thought process in terms of it. So that's also why we do the datathon every year, is that we know that the community that we are serving, our primary stakeholders are not data people, they're not ArcGIS people, that we actually need to get together and sit down with them and walk them through, here's how it works, and help to kind of, to bring them along to use the tools we've given them, but also to invite them to give us advice on how to make the tools better, it's really about that exchange. Also the charts, the two charts that I showed you, the county cultural, so that was obviously, those were handmade and designed, they're not built into Esri. Yes. Let me get you the mic, let me get you the mic so we can record it. So we can record your question. Oh, okay, let's say for example, faith-based organization, nonprofit, educational institution is looking for resources that they can tap into, your findings, are they easily accessible? It depends on how you define accessible, I mean, the whole data set is downloadable. Okay. One of the things, this is gonna be a little bit beside what you said, but it's adjacent and it's important because a lot of the folks in the community, what we have mapped is where the county is invested in the arts. What we have not mapped is all the arts organizations that are out there in LA County, that I don't know how we could even do that. I don't think it would even be possible for us to find them all. But if you are looking for resources, for like grants for arts and culture organizations, come to our website, lacountyarts.org and go to the Grants Opportunities page. We are the most friendly grant makers you have ever met. Our staff, if you find, if you send them an email and ask, hey, I'm interested in applying for this grant, can you tell me more? An actual human will respond. Okay. That's how accessible we are. And let me follow up, maybe ask a slightly better question. Since you've kind of done the heavy lifting of kind of like taking a look at the landscape of all available grant sources in the county and kind of taking a snapshot of that, is that information available to say, faith-based organizations, non-profits and educational institutions? Because obviously what's available via the arts is great, but if you're looking for STEM style grants or I know the state of California has what they call A through G grants that are available for COVID recovery efforts, as it were. So if I were looking for other types of grant sources that maybe you've come across, is that information easily accessible? We've got a couple of things I would point you to. We have a COVID recovery page on our website, but it is gonna be arts focused because we do arts. The county has a page specifically devoted to ARPA, ARP, American Recovery Plan funding. There is tons of funding available that is open, a number of those grants are open right now. So I would look for the county's ARP page to find that. And that's gonna be well beyond arts and culture. Any other questions? Hi there, great talk and I came in a few minutes late. How did your department start? I was just looking up, I'm a resident of Orange County and it doesn't look like we have anything in the government that spearheads or directs such efforts. I'm curious if there's a way to maybe push or was it citizens, was it the government? I'm just curious a little bit about your history. So yeah, we're a department of county government. We're the local arts agency for Los Angeles County. I find it hard to believe there's not a local arts agency in Orange County. Okay, so local arts agencies, now y'all are gonna really get me in the weeds of like. So local arts agencies sometimes are government, sometimes are structured as non-profit and are outside of the government but funded by government, and sometimes they're a quasi entity in between the two. And for good or for ill in general, places that are more conservative tend to have them as non-profits outside the government. I won't say anything more than that. But yeah, so there is a local arts agency there. So if you found the non-profit, you probably found the right one. Other questions? And one other question, you mentioned this 1% of a building's construction cost. It has to go to arts or culture. Is that across LA County, all the cities and jurisdictions? Again, in the weeds, so for LA County, we have historically had a, and we've had a percent for art program for all government buildings, 1%. This is kind of a somewhat standard policy that many cities and counties have. So we have a 1% for art for all county buildings with it, but that is only county built buildings. City of LA has their own percent for art for, but it covers both public and private developers. In the last year or two, LA County passed a private developer percent for art policy. So now, anybody, private developers, there's some exceptions, but private developers building a building are required to set aside 1% of their construction costs to art. And we have any number of mechanisms through which that can be done. So individual cities within LA County, we have no jurisdiction over. So individual cities will have their own, on the private side. For government buildings, wherever the government building is built, if it's inside of a city or in an incorporated county, we do have jurisdiction over that, if that makes sense. So I'm gonna put a pause on this here and ask us to thank Bronwyn for a great presentation here. So if you guys have any other questions, sure she'll be around. And coming up in about 10 minutes, we have the epic battle between Microsoft and the privacy laws of Europe in the same room. So thank you all for coming. Do you want me to introduce? You can do whatever you want. I'll give you, I'll flash up a signal when we have. All right guys, we're gonna get started with the two o'clock talk. I'm glad you guys are all excited about the last talk. Let's see if we can get some similar excitement here. Josh is gonna, Josh is gonna talk through us the epic fight between Microsoft and European privacy law, which I am personally very excited about because I came of age in the 90s and I remember that people were not really keen on Microsoft then and somehow they've cleaned up their reputation but I guess not all the way, so. Thank you for the wonderful introduction because that's, well, very similar for me. I'm still rather suspicious of Microsoft but they've put in a fair bit of effort to present themselves in the cleaner light these days. And honestly, I don't want to bash, I mean, obviously this talk is about Microsoft so don't worry, there'll be some bashing but I don't mean per se to bash on them as much. Given that I think it's a problem when companies get so big, they get bigger than some governments. At that point, this distinction starts to fade except of course you don't get to vote them out that easily. All right, so there's this name stricken through because as some of you who know Frank might have noticed, I'm not Frank, he could not make it because he managed to submit talks to two different conferences at the same time and get accepted at both. And, you know, well, that turned out to be a physical impossibility. So instead of him, I'll be talking and going through his slides largely. So whenever it goes wrong, blame him and if it goes well, you know, or the other way, whatever floats your boat. All right, so my name is Joss Portfleet. I'm Dutch, live in Berlin, have a wonderful dog and a wonderful wife. Been around open source and Linux for a long time, Mandreva, Mandrake, back in the day, these days very obscure Linux, this I guess from Europe. Also use Debian Ubuntu and, you know, well the whole thing back then, even was using Arts Linux while already working as open source community manager. Tells you about how open the open source community is by the way, go check out the booth. They're really wonderful bunch here, really wonderful people and really, yeah, supporting a lot of other open source communities here. I know many booths that over time had trouble staffing their team, then merged for a while with the open SUSE crew that gave them graciously a place at their booth or group of booths and some of these communities then grew out of it again and now have their own booth. Think the GNOME team for example, for a while were here just because open SUSE kept them, you know, kept giving them a space and now they have their own booth again, which is really cool. Also, well, worked at own cloud and I'm co-founder and now work at NextLoud, just like Frank, who's the CEO there. And well, as in our role at NextLoud, we are effectively building a competitor to Microsoft 365. You know, this thing that does everything, you know, office, document sharing, teams is rolled into it as well these days. It's a complete collaboration platform and that's what we have built except ours is open source on premise. And well, as a competitor, we of course are occasionally bumping into the behavior of Microsoft when it comes to certain anti-competitive behaviors. And so that got me, or got us to this subject and to this talk. If I have time left at the end, which is likely because I don't have that crazy much time I think I need, I can talk a bit about NextLoud. There will also be two community NextLoud talks tomorrow and a talk for me tomorrow more about privacy. So I didn't even know about the other two NextLoud talks until like a week ago. So there's actually a fair bit of NextLoud here. Plus we have a booth. So if you're curious about how you can move away from Big Tech and, you know, run your own server or find one at a local hosting company and keep your data under your control and, you know, come by the booth or I'll maybe talk a little bit about it at the end of this talk if there are no other questions about this stuff. So let's talk about our digital world today. Data is a new oil. I think that's extremely obvious, especially now with AI and all the data that is needed to train them. Having that data means you can train the AI if you don't have the data, even if you have the software, even if the software is open source, you can't create the model that you need. And if you can't create the model, well, in the end, you've got nothing. So it is very important to have control over this data and the companies who are currently providing us with all these beautiful free services, they are very smart and they are trying to gobble up as much of the data as possible, even if it costs the money to do it. I would like to remind you that something like YouTube is still losing money, technically, but the data will be worth it someday, I'm sure. And this also goes for all the photos hosted on a variety of services like, I don't know, Flickr and obviously by Facebook. And more and more companies are slowly realizing they are software companies and software become more important. I think that new technology like JetGPD makes it clear that if you're making a product in whatever sector you're in and you're not looking at AI, you're doing it wrong and you need to start paying attention. And again, for AI, you need data, otherwise it's not gonna work. So this brings us to the subject that at least in European political circles is called digital sovereignty. And well, what is it? It's the degree of control you have over the data you generate and you work with. So it's about, yeah, it's about control. It's about independence. And the term sovereignty, of course, is more relevant, I think, for a government than directly for an individual in this sense. So what's the problem with digital sovereignty? Why should you care? There are four kind of strategic risks with it. The first one is social. It's, if you don't control your data, it's a problem for people, society, businesses in general. People change their behavior when they're being surveilled. There's been some interesting psychological experiments. If you put a camera on people, you tell them the camera's on, they'll behave different than if you tell them the camera's off, even though, of course, whether the camera's on or off is not something they can see. People fundamentally act different when they're being observed. This is why having social media everywhere and having cameras around us all the time is changing the behavior of our kids. They don't behave the same way many of us did when we grew up, because there weren't cameras around all the time. The whole TikTok generation, I suppose. I'm sure you're all familiar, especially if you're parents, you've been thinking about this, at least I would guess. And that there is a problem there. I mean, people need to control how they present themselves and privacy is really important for that. You're a different person when you're with your family, then you are at a conference like here, or then you are at your job. And to be able to control your different personality, who you are and how your colleagues see you versus how your partner sees you, you need some degree of control over your data, of your privacy. It's a very fundamental thing for people. It's very important. I mean, why do you wear makeup? Why do you wear certain clothes, a bent t-shirt or a t-shirt of your favorite open source project here, or even the other ones? These are things that matter to us in a whole variety of ways. And well, the other one is of course politeness, but there's more to it as well, right? You don't want, you want to present a certain image of yourself. But it goes beyond that. And I think the political one is rather obvious. There's been plenty of talk about political interference using big data platforms, using Facebook and other platforms, and what this can do. There's currently debate going on about TikTok and what it does. And I do want to point out that in China since 2018, there are very strict regulations on TikTok. Who can use it? How long they can use it and what they can use it for while here in the US and then basically the rest of the world, there's only been the first traces of some kind of control over this from March 1st this year. So at least the country where the technology came from, they realized there are problems. There's an impact that these technologies have on society. And politically, of course, I mean, yeah, again, we don't know how these technologies really impact our political discourse, but that have an impact, I think is obvious, right? Our communication happens online. It happens through platforms like Facebook, Twitter, et cetera. And these technologies are not public domain. They are owned by companies who can, within reasonable rules, do what they want, which is fair enough. Businesses can do what they want. And I think they should. But it is a problem that we're all on the same three, four platforms because that gives them extraordinary power. And we don't know how they're using it. I mean, none of you can explain, and even the people at Twitter can explain why a tweet shows up in your timeline when it's the recommended for you timeline. Facebook can't tell you why a certain thing shows up in that timeline. It's just those algorithms are so complicated. They themselves don't really understand them. And only not even talking about something like TikTok. And if they want to make a certain brand more popular, they can just tune somewhere in these algorithms something that makes positive things about the brand show up more often and have negative things a little bit less often. And they start changing people's perceptions. Because after all, when you're on Facebook, you think what you're looking at is your opinions of the people you follow, your friends and family. But that's not true. There's a filter between what you're looking at and what's really their opinions, right? There are multiple filters even. There's a filter that they choose what they share and what they put online. And that might have been triggered by things they saw maybe made really pissed off by one message or another reel or not. And the second filter is of course what Facebook's algorithms or Twitter's algorithms or you name it have decided you should be looking at. So this has an impact and we just don't really know what exactly. Of course, there's a business side to this as well because if everything goes through a few companies, these companies have extraordinary powers. Ask any hotel owner if they have a choice whether they are on booking.com or not and paying about 20, 30% of the profits of that platform. They'll tell you they don't have a choice. We can say, well, free business are free to do what they want, but it's not really a free choice. And the house cable internet going, right? How, what are monopolies doing in general to business? This, there's a risk here when it comes to data. And again, if data is new digital oil and it's all owned by a small number of companies, this is gonna impact commerce. This is gonna impact business as well. It's, yeah. We let a couple of monopolies control the market. We all know what happens already. And last thing, but very much related to that is of course innovation. If you built, and I've been giving this example since early 2000s, back then it wasn't much more relevant example than today I think but I always said, you know, I was talking about open office back then, LibreOffice and trying to explain to people why it's important to have an open source office suite. I said, look, if you invent a really amazing feature that makes writing letters incredibly much easier, think about something like chat, PT, but then in early 2000s, that would have been a revolution. But you couldn't have built an office suite and then building your amazing feature. You were competing with Microsoft Office. You would not have been able to actually build a business around your feature unless you managed to somehow integrate it in Microsoft Office, which means you're entirely playing on their domain. In other words, the only shot you would have had at making money out of your innovation would have been, I don't know, get bought by Microsoft, probably for a nice discount because what else could you do? It limits innovation. You couldn't really have built something. Sure, it's a free market, but only free as long as you're Microsoft because if your product isn't 100% compatible with Microsoft Office, and also seen as such by other people, then good luck. And it is, of course, the same today with lots of products. Right, so now I'm not saying all of this needs to be solved by government, but this talk is about what Europe is trying to do and how they're fighting with Microsoft about this, so that's where we'll end up either way. Let's have a look a little bit at what Europe is up to. Now, I first gonna talk a little bit about privacy in general because there has been a couple of regulations about this in Europe since 2000, and what they tried to regulate here was the control of individuals over their data. So can you put, if you hand over data, your personal data, maybe just name and password, but maybe also other data to a company, and this company was a US company, or another company somewhere else, what happened to that data? Did you have any recourse as the data leaked, et cetera? And well, this regulation put in some rules, but basically what it said was US privacy rules and European privacy rules are equivalent. That was kind of what it boiled down to. And the thing is, they're not, certainly not when it comes to the rights of foreigners. So when this finally ended up at the European Court of Justice, they gave a big note to this. They said, Luke, in Europe, privacy is a fundamental human right. And in the US, it is far less protected as many of you might be aware. And so, well, this was considered invalid, this rule. Of course, it had already been in effect for quite a while, and the European government started working on a replacement, which they called the EU-US Privacy Shield, which had some different wordings, but in the end, was pretty much the same, and as we'll see soon enough, would also be struck down by the courts. In the meantime, Europe was actually strengthening its privacy regulations with the GDPR, the General Data Protection Regulation. How many of you are familiar with the GDPR? Oh, wow, that's like two-thirds. Are you all law nerds? What is the audience here? Fair enough. Right, California has CCPA and CCPRA, yes. Yeah, and they're modeled after GDPR, sorry. I'm repeating now here, but it's pointed out to me that it's relevant if you do legally related stuff in the US as well. Yeah, I knew that the Californian law was modeled after the GDPR. I don't know, of course, how close this modeling is, but if so many of you are familiar with it, then I'm not gonna spend too much words on it. But it basically puts a whole bunch of limitations, of course, on what companies can do. They have to be transparent, they have to remove data if you ask for it. In other words, it really puts the user in charge of what happens to their data, at least to quite some degree. In Europe, it mostly applies to bigger companies or commercial entities, at least that's what it's intended to be. In practice, you see a lot of organizations be sometimes maybe more worried about it than is really necessary. On the other hand, I think a little bit of data and privacy hygiene is a good thing. So overall, I don't know, it causes more paperwork, but I think it is a good thing to be more aware of privacy, security, et cetera. Anyhow, this obviously raised the bar, and well, then this happened. So President Trump signed the Cloud Act, which in kind of simple terms says if data is under jurisdiction of a US company and a US judge says, I want it, they get it. Doesn't really matter where the data is. And that's rather incompatible with the GDPR that says the user whose data it is is in control over the data and gets to decide who gets access to it, period. Of course, for a European judge, you could overrule this, but not a foreign judge. And in the end, the US is a foreign country. So when this ended up in front of the European Court of Justice, guess what? There was a big noep again. So this means the GDPR and the Cloud Act are incompatible, which technically should mean that as a US company, you can't hold data of European citizens, period. Of course, if that were the case, there would be a huge number of companies in violation with European law. And honestly, I have not seen a real answer to that. This seems to be the current situation, although there are discussions about it. I believe, well, some of the laws that the Diesel Market Act got through in Europe already. So they basically are in a way gonna raise the bar in a couple of related areas as well. So Europe keeps moving into a more protection for privacy, security, et cetera. So Diesel Market Act, it has a ton of provisions because they like to make the laws complicated. But one of the more relevant ones is that in a way, you could call it a GDPR for business, in a way that it forces companies to also respect data ownership of businesses. And it puts especially strengthened limitations on really big companies on a couple of the things that I mentioned earlier about competition, pretty much. So forcing companies, for example, to give you the lowest price to be allowed on your platform, like booking.com does, this can be challenged under the DMA. So you can say, yeah, but if people come directly to my door and don't need to give them the 30%, so I can give them a lower price. And if then booking.com says in their rules, well, if you wanna be on a platform, you're not allowed to do that, that's a rather anti-competitive move. And Diesel Market Act might forbid that. I mean, it's all legal, so certainly it's hard to find. Meanwhile, the US is moving in the other direction, now thinking about whether even encrypted data should be accessible by US law enforcement, which again continues to be going against what the GDPR and European laws would consider okay or legal. Now, a new solution. We had the Safe Harbor and the other one, right. I forgot the name, but we got the third one now, which is a transatlantic data privacy framework, which was agreed on in principle. And I think this was already now, yeah, this has already been fully agreed on and signed. They've added a dispute mechanism to basically the previous systems, but it is still, I believe, yeah, in front of a US judge and not a European one. So kind of waiting for a scrams tree decision by the European Court of Justice, which I don't know if you ask me and if you ask a scrams who is a Austrian lawyer after whom these cases are named. If you ask them, it's probably gonna be the same thing. It's gonna be struck down after which we will get probably another ruling or another, you know, new deal. Given the direction of both countries, if you go back here, given the direction things are moving with the EU, the EU creating more strict privacy regulations and the US giving more control to law enforcement, I'm quite curious to see how this will develop. We can debate this later or even now if you want, because I have a different subject going forward that I wanna talk about. Are there any questions about this or maybe debate points or things to ask because I'd be happy, yeah. Please ask, I will then just repeat the question. Yeah, so the question is, if I may rephrase it that way, if you look back at traditional operating systems like Windows, this is running in your home on your premise under your control, but with a mobile phone, I mean, let's be honest, the degree of control we have over the data on our phone is minimal at best, right? I mean, Android syncs almost everything by default and so does iOS. Apple does a fair bit of effort in encrypting and end-to-end encrypting and then saying like we don't have access to it. So I think once the data is in plain text on the servers of these companies, AKA Google and Apple, they fall under these regulations and therefore, yeah, everything that we're recording and taking with our mobile phones is a problem in these regulations, I think. When it's end-to-end encrypted and you don't have access to it, at least directly on the server, it gets a little more interesting because in theory, you could, as Apple, you could of course push a specific update to a specific person's iPhone that bypasses the whole end-to-end encryption and just uploads the files. And I mean, that person, none the wiser and nobody else. And I think it's an interesting legal question whether a judge could force Apple to do that. I guess that would be really quite a hard call. I would guess not. So end-to-end encryption, I think protects you. I think this might also be why other players are also introducing more end-to-end encryption because it creates this layer of trust. Of course, I could still do it, right? Like, if WhatsApp has end-to-end encryption, I think this is mostly for legal reasons because if you in a unencrypted format are trying to plot a terrorist plot and they don't catch you, there's blame. If it's end-to-end encrypted, what can they do? So I think what they're looking for is ways to avoid having to, you know, moderate conversations almost as much as they're actually trying to provide security. Have you think of what happened in Myanmar? How many of you are familiar with what happened in Myanmar and Facebook? All right, that's not as many as I had kind of hoped. It's a bit of a nasty story, but in Myanmar, where, yeah, democracy is not as well-established as over here. Let me put it that way. Facebook did not imply many, if any, moderators. And so the conversations on Facebook there spiraled out of control, resulting in what has been recognized, I think, by the United Nations as a genocide on Rohingya Muslims living there. I think you will find most experts agree that if Facebook had had proper moderation in place, this would not have happened. I think the implications of that are quite shocking, quite bizarre when you think about it. And then what happens if you employ end-to-end encryption and you can't moderate anymore? I mean, it'll save you money on moderators. Might also result in more genocides. It's a weird world we're living in when you're wondering if end-to-end encryption is actually a good idea or not. I don't have any answers here either, but it's a bizarre conversation. Either way, I don't think I have any hard answers to your question. I think what you brought up, like the mobile phones and the way they send any data, usually immediately to cloud servers make the whole problem a lot more visible. All right, any more on this subject before I'm moving to antitrust? All right, because this gets us a little closer to what we have been up to at NextLoud. I think many of you, because this was brought up in the beginning, many of you might be familiar with what happened with these two. For those who are not, the cliff notes of it, in the 90s, a company developed pretty much the first browser and build a business around it. Then Microsoft decided they wanted a piece of the pie, build a browser too, bundled it with Windows, which in the end killed, of course, the other company. And halfway down the killing of the other company, the other company said to antitrust authorities, this isn't really fair. We're not competing on a level ground here. It's not about who has the best product. It's just about they have an operating system we don't, so they win. That's not really fair game. The antitrust authorities agreed, a couple of years after Netscape and bankrupt, so that wasn't really helpful. But yeah, the bundling thing is still happening or happening again, which is where we come in. So as I mentioned earlier, NextLoud is building a self-hosted, fully open source collaboration platform. So think of file sharing and syncing, think of Dropbox, Google Drive, Microsoft One Drive. Now, what do you find bundled with Windows these days? One Drive. So we went to the European antitrust authorities, together with about 30 other small tech companies in the open source space and we said, hey, that's not really a level playing ground here. Why can't we be in there as well? And well, this garnered a fair bit of attention and a whole bunch of conversations with the antitrust authorities. Let me be honest to you, we're not expecting them to fix this before, well, I don't think we'll go bankrupt, we're doing just fine, but this is of course not gonna get solved by them in the timeframe that is relevant business-wise, but we do think that it's important that government is paying attention to this stuff. And we've been told with so many words that the Digital Market Act, for example, might help in this regard and might also then give the European authorities the legal tools to deal with it. I'm not sure if the Digital Market Act is enough and we would prefer if the cartel organizations would continue their investigation a little more aggressively than they are, but at least there's talk happening, there's discussion happening. And other companies have also started conversations about this, for example, Slack has filed a complaint about the integration of teams in Windows, which is of course, again, the exact same story. But luckily, Microsoft has responded by of course, integrating their products even deeper. And it's funny because those of you who remember the browser wars in the 90s and the Internet Explorer versus Netscape story, they might remember what Microsoft did to deal with this. They integrated their file manager and the web browser in a single product and then they said to the antitrust authorities, yeah, but it's so deeply integrated, how can we ship our products without a file manager? Sorry, it's quite amazing and it seems like they're trying to do the very same thing right now with OneDrive, making it such an integral part of the product that if government might decide they need to disentangle the mess, they will just say, yeah, but we can't because you know, how could you? So yeah, they're up to the old tricks, I would say. It's quite interesting to watch. And of course, you see them in other places as well, moving SharePoint. I don't know how many of you are familiar with what's happening there, but it won't exist as an on-prem product much longer. They're moving you to the cloud, whether you like it or not. If you talk about digital sovereignty, that's not fun. So a lot of organizations are actually looking for alternatives. Well, we're one of them again at Next Cloud. And well, I think many of you might also be familiar with the whole discussion about whether you need an online Microsoft account to install Microsoft Windows. I discovered that if you Google, Google, well, recommends searches on this and the other one is very hard to see, but let me see. If you want to stop Windows from saving to one drive, there were what was at 5.2 million results. So people are definitely wondering how to get rid of that integration. So again, customer demand, I don't know. People are not all a fan of it. So anyway, the saga is not over. I can't really predict what's gonna happen here, only that it's probably gonna take a long time. Well, I promised it wouldn't take that long. It did not. Any questions? Yes. Oh, God. Failing to... So true. Yeah. So let me repeat the question. Mr. Here pointed out, pointed to the open document format situation and said Europe has a proud tradition of creating very sane laws and then ignoring them. If I maybe so bold as to summarize it like that and you're asking how I see this develop, I'm not expecting a lot of changes. Let's be honest, but I mentioned digital sovereignty in the beginning. Yeah, I mean, let's make it extremely concrete. I mean, if European court would say, you know, today you have to abide by the GDPR. In the interpretation we used before, AKA you can't put your date on the jurisdiction of a US company, then all of Europe's economy comes to a grinding halt because they're all completely dependent on either Google or Microsoft. And it's just a plain and simple fact. And if you say it would be similar, I think even here in the US, if you would say let's take this privacy laws, the Californian privacy laws, 100% serious, I wouldn't be surprised if a lot of stuff also stops working because, you know, you can't, yeah. These companies are built around violating privacy because again, they need all the data and they need to be able to use it. I mean, Judd GPT, Dali, the new AI image generators, would they work without all the copyrighted image from artists all over the world? Yeah, they can't create art in the style of Friedrich Kalle if they don't have training data from the artist. And this is true of control of this. I don't see it changing. It's, I think, and Microsoft, before I get to your next question, one more thought on that because I've noticed that Microsoft knows this, like they're very much aware of this pattern and they're basically reinforcing it. So quite some years ago, when the term digital sovereignty started to become relevant, I think it's about four or five years ago, I went in Berlin to a Microsoft event about digital sovereignty where they basically got on stage and told the gathered politicians there that, you know, Germany should stick to cars, leave the software to Microsoft, and you have no alternative deal with it. I mean, I'm paraphrasing a little bit, but that's pretty much what they pulled down to. Yeah, as long as European politicians believe that, well, they're not gonna try to enforce the law, right? And so you create a nice self-fulfilling prophecy, which is, of course, well, that's a great way of dealing with it. Yeah, then your question. I've been hearing recently that some of the hybrid cloud providers are being forced to develop private clouds for individual companies to deal with, like GPDR and that sort of thing. I know IBM has been doing quite a bit of that. Has Amazon and Microsoft been developing countries-specific clouds to deal with those issues? Yeah, that's an interesting question. So when we're talking about cloud, there's, of course, a whole bunch of different meanings to that, fully aware of that. One of them is, of course, when you're talking about AWS kind of clouds where you run various workloads on, but also like the workloads themselves, AKA an application or a set of applications like Microsoft 365 is referred to as a cloud. So let me answer about both. I mean, when it comes to a cloud infrastructure like AWS, you have OpenStack these days, so you can run it internally, and there are plenty of companies doing that, just building their own internal cloud to run their own workloads on, sometimes doing hybrid stuff. I mean, you're all aware of this because there's tons of conversations about this happening. When it comes to Amazon offering kind of an internal version of AWS to companies, I'm not aware of that. I also would imagine that those companies would simply use OpenStack because there's a great degree of compatibility there. I mean, it doesn't maybe offer all the bells and whistles, but I think as a company, if you say, look, we can't use AWS, but we want the feature set, then with OpenStack and Kubernetes and all these things, you can get fairly close with perhaps some extra costs or maybe even not, right? There are ways to save costs with that as well. When it comes to Microsoft, it's interesting. I've had a conversation with a gentleman from a very big European car manufacturer, you know, and I asked like, how do you deal with the GDPR issues? And they were like, well, rather paranoid about that. So yeah, our data cannot leave our premises. I was like, okay, so, you know, are you using NextLoud? And he told me they had basically gotten such a kind of private cloud from Microsoft, something that was really completely firewalled off from, yeah, the rest of Microsoft's cloud service. So from what I can tell, because this is not being, I haven't seen this offered anywhere. I don't think they offer this to anyone who just asks, but if the customer is big enough, they'd rather not lose the customer to a alternative solution like ours, I suppose, and they will offer it. I'm sure there are technical difficulties to come with it, and there are significant costs, I'm sure. They have tried to run a European, hosted, owned, and controlled version of Microsoft 365, together with Deutsche Telekom, the German telecom company. This basically run onto technical issues. They did not, apparently, I mean, they announced there was no demand when they shut it down, but I mean, I find that hard to believe, but the rumors were that they just couldn't get it to work properly without breaking and causing various issues, but they are trying similar things again. So the trick they're doing here is they're basically just letting Deutsche Telekom, they create a new company, that's an European company, partially owned by Microsoft, partially owned by Deutsche Telekom. That company, of course, and pay license money to Microsoft, runs the software on German servers, managed and controlled by Deutsche Telekom. Only Deutsche Telekom employees have access to it. Yeah, it's a kind of a legal solution to the problem that works, because then if Microsoft would be ordered to hand over data, they can't. Deutsche Telekom owns and controls the data. Of course, Microsoft could do the same trick and change their software and hand over the data that way, but then you come back into the legal quagmire of, can you force a company to hack one of their customers' phones effectively by providing a special payload that uploads data? I mean, I think that's really, I don't know, it might be happening, of course, because this might all be then happen sealed, but I think that's really tough on. So this is happening and I don't know the degree of which because it's not that public. I hope it answers the question, yes? No, right, yeah, you open a whole new kind of worms, email. Once upon a time, completely federated and open, and I think it's still a good model, but yeah, given now everybody's hosting a Gmail, yeah, calling it open is not really the right term anymore because if you get blocked by Gmail and Hotmail and maybe one or two more, then effectively you don't matter. Of course, you have companies like Centgrid that then pride themselves on having deliverability. We at Nexlile are not currently offering a email server as part, so our product has a mail client, but as I say, it's a client because running a mail server, technically it's not hard. You can get tons of solutions for that. It's very easy to run a mail server and get some of your emails delivered some of the time. It's nearly impossible to run a mail server that gets all of your email delivered all of the time because the big companies are constantly changing the rules and playing with them. And as long as Microsoft can get its emails delivered to Google and vice versa, they don't really care about the small players. And of course, one of the reasons for that is spam. Sometimes I think it's also an excuse, but yeah, running a mail server is really hard. I mean, you still get with most of your providers, you get an SMTP server, and it's usually fairly reliable. Lots of big companies still run emails, so I think it's more the trend that worries me personally than the current state. Like it is a bit nasty, but if you have a part-time sysadmin who knows what they're doing, they'll be able to keep your emails arriving. We also run our own mail server at Nexlile. Not for the bulk email, like the marketing emails, they, for that we use SendGrid or something similar, which is I think also, yeah, that's okay. But given the trends, I think five or 10 years from now, it might be nearly impossible. Honestly, I leave this problem to other people to fix because I, yeah, it's a tough one. Exactly. Yeah, there is a small company trying to think of the name. They built an open source tool that monitors the deliverability. And what they were trying to do is to automatically configure a mail server to guarantee deliverability. So what I wanted to do is they, you have a bunch of mail servers that would have their app installed, open source app that runs on the mail server. And if mails don't get delivered, then you try to fix it. And then it would then roll out that fix to the whole network, kind of. So kind of like solve collaboratively and in a decentralized way, this problem of big tech constantly coming up with new problems for the mail servers to solve. I can't think of the name. Come to me later, I'll try to find it. But it was a pretty clever solution and open source. So any more questions? A few more minutes left. Yes, cyber bullying. Yeah, I don't think it's part of the GDPR. I think this is separate legislation. I've not followed this closely honestly. So I can't say anything smart about it, sorry. Any more questions? Well, we're close to the end. So, ah, one more? A comment, okay. Well, then I'll give you the microphone and that'll also be the end of it. So, thank you all. Yeah, I had sort of a comment, what he said. Normally what happens between the EU and the US or other countries that are the EU in particular, they usually work with each other to adopt their standards. And normally the EU is usually a little more strict so the US tends to adopt those over time. Interesting point, yeah. I mean, I think the California privacy laws are a nice example of that already, right? All right, that's it. Thank you all. Appreciate the time you took. Thank you all for coming in about 10 minutes. We'll have another talk this time about transition.city planning your own public transportation using open source data. So, see you in a little bit. Next thing. Hello? All right, coming up next, we have bringing transit planning into the hands of everyone and we're gonna learn how to make your own bus line, which I'm very excited by. Thank you. So, as a, I'm Yannick Brasso. I'm gonna talk to you about some transit stuff. So, this is my neighborhood. I live in Montreal in a neighborhood called Verdun. I'm just a little bit far from the subway so I often rely on buses to get anywhere else in the city. And, like most of Montreal, all the streets are aligned and well-designed grids so it's really easy to pass a lot of bus lines. But, from my places, even though I have access to five or six different bus lines to go connect me to the subway, I feel that there's times during the day where I don't have access to any buses. Like, there's a big black hole of no buses from where I start. And, I started wondering, is there's a way to prove it? To, I can do something, to like go to the city council and say, there's a problem in this part of the city. Luckily for me, a couple years ago, I started helping a team, working on a tool called Transition, to help design bus network. So, I got a tool that might be able to help me. We'll see. So, our goal with the tool is to make transit planning, transit planning tool that is the best you can get for transit planners, but make it as easy so that anybody, any citizen could take the tool and use it to analyze the transit network in the area. If you look up my profile at scale, might have seen that my previous talk here was about the kernel and kernel upstreaming and all the deployment process. So, why am I here to talk to you about transit? So, the story is like a couple of friends working on this tool, they can help us and after I left my previous job at Facebook, I was like, oh, maybe I can help you. We'll see, what do you need? And they start talking to me, oh, we need to deploy the software to a bunch of partners. Okay, sure, how do you plan to do that? And they told me the, our story, we can just like start a bunch of VMs and like SCP the code there and SSH the config file and that's gonna be it. And I got there really, really scared about that. So, okay, you need help, I'll come and start working with you despite having to write some JavaScript code in the future. But luckily for me, I'm not a transit engineer, I don't know much about all the theory of it, but I'm surrounded there in the share mobility by a bunch of professional people, like we study transit, the PhD level stuff. And so the theme is really a mix of like compute engineers, transit professional, and we also have like an economist who study like the impact of transit in the various cities. So we are part of the L'École Polytechnique de Montréal, which is an engineering school. Since we do research, we do applied research. Over the years, we develop various different tools. First one called Evolution, which is a travel server platform and talk a bit more about that later. We have a tool called Oscars, to able to do a dashboard and monitor the congestion in the city. Similar to about taxi dashboardings, you know, what's the taxi use agency and the tool I'm gonna talk to you today. Transition, which is all about doing the planning of the mobility in a city. So I'm gonna talk to you about what is transit planning. I'm gonna go in more detail about the tool, what it does, how it's made. I'm gonna talk to you about the data source that you need to be able to do some transit planning. I might attempt a live demo, we'll see our time goes on if it works. And I'll talk to you about some of the challenge about writing code in a research environment. So what is transit planning? The role of transit planner is to start with the territory, a city, a region and do analysis of what's going on there, what's the population there, what's the need of mobility of people. They're gonna do, they're gonna study what's up there, what's the road network, what's the current bus network, cycling path, what people can use to go around and they're gonna, after that, formalize preposition, but what can we do to improve, what will reduce the impact, the environmental impact, reduce the energy level, or just improve the time spent of people moving around in the area. They will do cost estimate about the construction of the network and also the operation, which if you have a bus network, the main cost will be the operation and not the construction. They're gonna do consultation, talk with the citizen, get their feedback, they're gonna do counting, they're gonna see how much people are using it and then they're gonna give suggestions to elected officials and follow up on the implementation of that, see if the new network follow what they're working for and they're gonna just start again and again just as an interactive process, make sure you get the best network, pull them more the better way. There's more and more people living in cities, cars are getting bigger, like obviously moving people in cars in the city is the less efficient way you can do, even if you electrify them and as a way to solve the climate change, you still have the space problem in the city, so it's better to try to move people in a shared way, in a bus, in transit, in trains, than having everybody ride in their own private vehicle. But I will try to convince myself to talk about transit, but talk about how we plan the network. So for our tool, instead of targeting only the transit planner, we want to target every kind of users of the network. First, the transit planner is our professional, if you're a big city, you probably have a bunch of people thinking about mobility in your city. But we want to make the tool available also to smaller organizations, you might have a small town, a small regional organization who wants to figure out how to improve mobility in the area, but they are not big enough to have dedicated staff of people working on transit problems. So we want to make the tool as good for people who have a bit of a standing or like work for a city, for example, but not a full time in there. The third target that we have is we want to make the tool so that elected officials themselves can go and use the tool. We don't just have to get the report from transit planner and take decision based on that. We often see a clash, we'll get a report, some city will come ask us like, oh, how can we improve transit? We tell them like, oh yeah, you should do that and they're like, oh, that's complicated or no, we cannot do that, why and the kind of second guess the transit planner suggestion. So if we can get the tool in there and it can draw themselves the bus line and see the impact of adding a bus or removing a bus will be in the general mobility of their citizen, we hope that we'll get the message better that there's way to improve the network. And the second part, we want to make the tool like as good for the citizens that can go and like do it as themselves, propose new lines and have like with all together with all those people and improve the governments of the city of the transit and have like a more shared view and more shared proposition for the network in there. Like ultimately we would love to have a tool like that easy to play, like if you have like, I don't know, your favorite city simulation game and just draw a bus line in there. We are not there yet, but that would be like the ultimate end goal to have like as easy a computer game. So let's have a quick look at what the tool does. So this is the main view of the tool. So we talk about transit, we talk about moving stuff around. So we have seen the main view is gonna be a map view of the area you're working in. I give a similar talk earlier this year that fuzz them, so this is the city of Brussels. I really like their bus network, it's all colorful, really shiny, it's really fun to see, so I kept that slide. So the main thing you're gonna do with the tool is like obviously it's at your network. So you have a few options on top to like add stops, add line, you can draw a line on your road network. If you had buses, but if you had train, you can just draw them directly where your train is going. You can obviously it's it's schedules. I'll show the screen after that. You can import and export your network. You can instead of having to redraw your city network, you can just import the GTS file that is provided and do that or you can even just export it. If you have like a small city, do you don't have like fancy tool to write GTS? You can use the tool, do a couple of line export as GTS and that can be imported to all the transit planning, transit planning tool that the user can use. We also have the concept of variant and that's probably the main tool that is important for transit planners to be able to compare different scenario, different network. So basically with variant, you might draw your actual network and then you do a second variant that will where you add new lines in your network and then you can do all the analysis comparing these two scenario or concept and see what will be better in there. As I said, you can add schedules or for every line you can compute schedule or we can generate schedule for you. Either you can specify a specific interval I want bus every 15 minutes, every 10 minutes, every half hour it will generate you a schedule or you can also just specify how many buses you have. You know you have 10 bus in your garage or you have the budget for 10 bus. You can say, okay, I have 10 bus, I generate my schedule with this and this is all the timing and it will know like with the length of your lines how many bus you can operate in there. It's not a tool made to really follow the operation of your bus, it will not consider like how much break you need to give your driver or how much maintenance you need to do that. There's other tools, other really expensive tools that does that but if you are really small operation it will give you a good estimate. If you say that frequency for your bus it will give you a quick overview of how much how many buses you need. So it will give you maybe a quick cost estimate but if you want to go like deep into operation and manage your maintenance, we are not aiming to do that at least not for the moment. So the first kind of analysis you're gonna do, it's simple routing. That's pretty similar to all the routing tools use Google map or any kind of mapping software or any application on your phone. You can specify the start point, the start time, overall time, how much time you want to do at most, how much time you want to walk to get to the network and obviously coordinate. You can ask it to like that, give me the best result or give me all the different alternative and you can use the data in there. The main thing that's interesting is it not just routing one route many tool that can do that but we have like batch operation capability. So if you have multiple starting and end point you can compute all of them in batch and then you can do comparison. Is your network better or not in there? So if you use that example starting in LAX at 6 p.m. when I arrive to get to here, this is what we get that for here in LA. I didn't import all the network in LA because I don't know how many different transit agency. I just did three of them so far. Just two, yeah, that's, yeah. So I did just three just to simplify my example. And there's a few artifacts. There's still some bugs, still work in progress like you see some extra line there but basically you'll get the information, you'll get a bunch of statistics about your network, how much time it took like for this example it takes 136 minutes. So I decided to take a cab and take the transit sadly. All the length of it and how much time you spend like in transit thing. For example, like you will have for this like I don't know one, two, three, four, five, six different vehicle. You at least you don't wait only 25 minutes between them. In this example, I take the example of you as changing the variance. So I did a scenario where I decided to exclude all the L.A. Metro rail network. Usually I take the opposite example. I remove all the buses but for this example it was not possible. Like usually people will tend to stick to rail and ignore the buses but for example, you might have maintenance to do on the line and you want to see the impact of like where people will have to go or tell them like what's the, what's gonna be the time impact of changing this network? So basically, so I did that. Now it's only buses. Now you see it's 168 minute. If you exclude all the rails so you can tell people like yeah, if there's no more train it's gonna take you about like at least half an hour more. The interesting part is like you actually wait less between trains or between vehicle if you just take the bus for about five minutes. But it's always a bunch of statistics especially if you do that in a batch operation that you will get and give you an insight of what's good, what's better or what's worse in your network especially if you do that with all the points in the city. The other main tool that the transplanner is gonna use is what we call the accessibility map. The idea here is you take a specific point, a specific starting point. You can also do reverse accessibility map where you pick a specific destination and you compute the area that you can reach in a specific amount of time. In this example, I have the different shade of blues gonna be the 20, 40 and 60 minute that you can reach for the specific area here, starting point, basically this place. And that gives you an area and you have about, you have always statistics. For in 60 minute you can reach about 350 square kilometers of area. If you, I did the same example with the other variant, removing trains and then you see like, okay, you remove some impact, some relining the network. This is the limited ability of people can reach only like 210 square kilometers. And that's really a big useful tool. If you compute that for like all maybe major population area in your city for specific network and then you add a new line or you change the frequency of your line, you're gonna see that space that's available to people grow and reduce accordingly, that's a really good indicator of like the wellness of your network for your citizen. I just wanted to keep that example for Brucell, really tight network as you've seen so that give you a tight accessibility map. The third tool that we have is simulation and optimization process. So basically the idea here is using, as I said, a bunch of like, either like real information about where people start and where people go or we can also just simulate that with if you have a population model and some model for the place you can do, we can generate those trips. And using a generic algorithm, we can improve your network by five to 10% in overall movement time. So we are able to reduce by optimizing your network, reduce like for about the example, we have like four or five to 10 minutes in general in the studio where we've done. You see there in the graph for most people will be improvement, some people might have a little bit worse using the algorithm but in general will be an improvement. And it's hard to convince people that you add a new bus line or do change a bus line but having actual all data that is actually better for most people, it's really useful thing to have. So it's not just in theory, we did a few study for various cities. This is Dramanville, it's a smallish city between Montreal and Quebec city, about like 80,000 people and they came to us with their old, this is the old bus network that they have, there's a bunch of different line in there. They were like, yeah, that doesn't, the city grew, there's more neighborhood, it doesn't respond to the need of the population, can you do something? So basically we came up with that running the generic algorithm. And yeah, with the algorithm you draw a bunch of lines, either randomly or like with some education, and it was like their best one. We are working on an algorithm to have that draw the line automatically, probably add some like machine learning in there, we'll see, to put a position right now, you need to have a human like go and draw a lot of lines and the tool will find the best way to do that. So the actually right now implementing this network, we have a few issues with the congestion. It's a small city, we didn't expect too much traffic but apparently they are getting a lot of problem with traffic and that impact the bus traffic, all the bus gets stuck in traffic and then the connection gone works as well as we expected but that give us some feedback to improve the tool later on. Another example that we made with the tool, so we have this, so this is a map of Montreal with the main subway line and in pink, a couple years back, there was a proposal to add this diagonal new subway line to the network and to go cover like the middle part here where you have like the biggest population area in the city and they came to the lab asking us, will that improve, that looks like a good idea but what will be the impact of that line to a general mobility? So we use the actual like transit survey so we know what people move from where and go where in the typical day and we were able to prove that in general having that diagonal line will save about 5% of time for many users in there and the biggest improvement will be people like who are currently driving, they are the one who get the most benefits in there. That's obvious because there's a big part on the top left where there's no much transit there, a lot of people drive so if they are purchasing to actually go a diagonal to go downtown will be good. This is a map generated, it's not generated directly by the tool, the data is generated and then it was mapped by QGIS but it's some view that we might add in the tool later on directly. Basically by using, by mapping all the current project in Montreal, there's a bunch of new train line being added so what will be the improvement for everybody and we see on the map every blue area is where people there's, I forgot the key but there's a significant improvement in their transit time where in red there's no, where there's no project, there's no much change so we see that the tool like yeah if you add more transit line, it will reduce transit time for people. So this is the main view or the main tool, the main thing that the tool does at this point. Let's have a look at what does it run under the hood. Right now it's mostly a JavaScript application, web-based application, client server, most of the code is in TypeScript. We have a few components in the back end in C++ and Rust, really hoping to migrate the most I can to Rust at this point. Lot of the code was written by students, the master, PhD level, it's not the cleanest that we can have so by rewriting some of it we'll hopefully have some stronger component. As I said it's split into like some back end, some front end and some common component at the JavaScript level. We develop a library with common component between all our software stacks, stuff that are like really common to all kind of transit analysis algorithm. We have a separate library and we have like the application stack in there. We use geographical data extensively so we build the thing on the post GIS and post SQL to manage all this geographical information. The basis of that routing is made with OSRM which is a pretty common tool to do routing on the OpenStreet map data. It's written in C++. We use that for like route network, path, cycling path. We are currently evaluating, switching into a different tool called Valhalla which might be more efficient in some of the specific use case that we have. For the transit part, obviously like the OSRM we do like a road network planning. For the routing path, for the transit path, we develop our own tool called TRRooting, simple name in there where we implement the algorithm called Connection Scan Out-of-Government. I will not go over in detail, I'm not sure I understand fully yet, but it's a really, really efficient way to scan your old network and all the connection and figure out the best way to go from point A to point B in there. Recently, especially at FOSDEM, we've met a team that make a tool called ModSys which is really similar. Doing that, they also implement some similar algorithm, so we might consider like just switching to their tool to see if it works for us, we might just collaborate and instead of having to support all the code, we could drum that part and use the other one. Otherwise, we'll continue to support it and I'll finish rewriting it for us at some point. So, having a tool is the first step, it's really interesting, but if you want to do anything useful, you'll need to have some data to do the analysis, to observe what's the impact of your transit network in New City. So, what do you need? You obviously need some kind of network, a real network, all the paths, like a path in New City. You'll need a transit network, existing bus lines. You'll need to know where you have population and where you have destinations and you have, you need to connect those two, so basically generate trips from source to destination in there. The big question you're gonna have is, is that data available? And in the various open license, like open, the ODBL, license used by OpenStreetMap, can you have other creative common in there or even public domain information? Like most these days, a lot of cities, a lot of government provide the data in some kind of open license. And the good thing, we don't reshare the information. You use it to compute it, so you don't have to worry too much. As long as the data is available, you can use it in there. The US, interesting part, is most of the federal government data is already in public domain, so you can use it without any issue. And just a quick note, like the creative common zero was developed to an average general definition that applies everywhere to the world, because not every individual rejection has a definition of public domain on the copyright. So there's now like a CC zero that you can make sure when you release that data, it's in public domain, you can do whatever you want with it. So the good news is you can do about everything you need, you can find it in OpenStreetMap. There's all the information there, obviously all the road and patent work is there. And for most of the city is gonna be good enough. Like obviously the quality varies from places to places, but these days, at least the road and patent work is pretty good, and you can use it as a good basis information. You're gonna have like a lot of a PUE point of interest, like shops, offices, gonna be marked on the OpenStreetMap, so you can use that to your activity center. And you can even extrapolate population, especially in the area where you have all the buildings, and if the buildings are marked correctly, if they are residential order, you can basically have an idea of the population density, and then use that to simulate your progress around. But as I said, it's not the same quality everywhere, so you have to do some validation if you want to have a really good accuracy in your results. We did a quick estimate in the most region we work with that it takes about 25 hours per square kilometers to validate urban area like Montreal or LA. If you go more to the suburb, it takes us about 10 hours per square kilometer and maybe just one or two for rural area. The important part that we need to do to make sure that we correct the most often is make sure we have the pedestrian and cycling path and the link together. And I put the example here of the conference center, and this is an area where, if you look at this area, you see some missing connection. This is the street, there's no sidewalks separated. That's one thing we do, we try to separate the sidewalk from the road, and I know that's a bit controversial sometimes in the open street map community, why should that sidewalk separately? For us, it helps a lot because you can see an actual distance difference between where the road is and where the sidewalk is, and if you have to cross a big road, that can have several meters, maybe a minute to your old transit area, the transit time, so it's there. And the connection here is really important, as you can see, if you want to go from this street to here, you cannot cross, there's no connection, so we don't know to cross there, so the only connection is here. So to do it properly, we'll probably add the sidewalk and from the conference center into the connection or at least do a connection from this corner to the road. The other big thing we're gonna do is add door to any big building. Same example here, if you just add the center of the building, it's hard to know where is the enter, and that will be really useful, like if we drop you at the bottom of the conference center and the entrance is here, you have to go all around and maybe you have taken a bus line that goes on this street instead of a bus line that goes on the bottom street. So we need to add all the, if you want to have a better accuracy, all the entrance and this marker for that in the open street map to add that. So that's another thing we do. We sometimes realign the streets, make sure the one-way are there, set the speed limit, maybe the street light that we'll have unpack on the bus transit line, and especially like if you have like weird, strange configuration of street, really steep angle that some buses are pretty big vehicle and we cannot do all the turns, so we have to consider that in the routing algorithm. And as we are in the map, we try to make sure we add all the point of interest if you have anything. So if you want to involve people to help open street map, one of the big thing missing right now is all the point of interest, all the shop offices, something that's really easy to add. You don't need to add all the information about it, at least having the type and the name is gonna last at least for us. And we try for big buildings of the number of floor that can help with the density in there. Right now we do all of that by hand, but we want to like probably evolve, we use like the tool like a task manager and map roulette to like divide the work. If you have a specific city, if you have a community that you want to help, you can target them with those tools. So the second part is using the, I think the transit network. And luckily these days, it's a pretty, pretty available, most transit agency will share those. GTFS 10.4 general transit feed specification used to be the Google transit feed specification, when they got invented was a partnership between Portland and Google, but now it's got generalized and it's an open standard that is created collaboratively by all the stakeholders. There's two variants, the schedule one, which is basically your static map. And there's a real time variant where you can know, where you can share if the agency has real time information about where the vehicle are, they will share that in that format. The format is pretty simple. It's a zip file with a bunch of text file with basically comma separated value in there for everything information, but this is an idea of all the different information. We have agents information, route trips, objurcy schedules, sometimes you're gonna have like fair information. So when you do the routing, you know what will be the cost for a specific user and you try to maybe reduce the cost in there. As I said, you can find that most transit agency have like a developer section these days where you can find this information, you can see the link in there, but not all of them have that as easy. I tried to find the one for the Pasadena transit agency and there was just a link, like email us, we'll send you the link. I'm like, okay, that's a bit annoying. But with some luck, there's a website now called Mobility Database that has about 2,000 meta information about all the transit agencies who will basically have a file for every transit agency with links and a direct link to where is their latest GTFS. So I could like get the Pasadena transit, just get the link, getting the file. I have a dream of having that directly integrated in the tool. It's not there yet. Again, missing time, but basically the idea would be just, okay, I'm here, please download all the GTFS file from Aria and we could easily do that automatically with all the meta information in that database. They are supposed to add an API to make that easily. There was a previous project called Transit Feed, that kind of got abandoned and now this one is the most up-to-date. For the population, as I said, you can proxy the buildings if you have them in the pre-sweet map. That will give you a good enough information, at least for transit planning. You don't need to count everybody, you just need to know how many people about come from air to there. So that will give you a good start for the simulation. We also use the LendRuze register. Basically, most government or region or state, depending on the area, we have some kind of public register of who owns land or buildings, what's their purpose, what's the value. So by just using that, that's usually really well-gelocated because they need to do the LendRuze of it, so you can use that as a proxy for, again, population. This is a residential area, what's the size of it, what's the value of it, is there any apartment that can give you a good idea? You should have all this information in there, so we can do that population. The other information that is widely available will be the census, but we have a lot of problem with that because the general area that you have is too big. If you have a census, the smallest information will be maybe a zip code in the US, but that will encompass like several blocks in a city. And if you go over three, four streets, you might have two or three different bus lines, and if you are one corner area versus the other, if you want to take transit, you will have a completely different path. We have students working on like kind of mapping algorithm that, okay, let's take the census information, how many people are in a specific area, and try to spread them in my streets to have in some way that makes sense depending on what's the street size or if I have any building and figure out a way to do that. And it will also be a really interesting tool, maybe anonymize some of this information so you can just, instead of having actual information, you can just take the survey, you know you need to have that kind of population that generates one that match. For the this nation, again, OpenStreetMaps is probably the best one. If you have the information in there, it's which is probably the most missing information. There's not a lot of other social information. You could scrape some actual map, probably not following their thermal service, but you can try that. Or if you still have like old phone books that will have like yellow pages, will have some of the shop information. But that's probably the part that is the most lacking in the open data information. Some cities, some region might have that available publicly, but it's really hard to keep up to date in there. Again, you don't need the exact information for simulating your network, but the most accurate you are, the most precise going to be your results. And connecting those two is going to be trip. This is the part that you will not find in the open, but most transit agency will have that kind of information in general. Basically, every transit agency will do a survey once a year, once every three, four, five years, and they will basically ask their users where they come from, they're gonna pick a specific week or specific day and ask everybody in the network, where did you come from, where are you going, why are you doing this trip? Are you going shopping, going working, going to school? And that will inform them in their planning decision in the future. They're gonna do that either by sending by mail a survey like the census, calling them and asking them by phone, or even just like stopping them on the exit of the subway and asking them, okay, where did you come from, where are you going today? That's really time-conservative. We have the evolution platform that make a web-based tool to do that kind of survey, which basically like whatever survey platform you have, what's really tailored for transit information. That's the evolution tool I talked about earlier. So, I have some time, so I can probably attempt to see if I can make it work life for you. Let's see. So, this is here, this is my mouse, okay. So, basically what I want to show you basically quickly is how we can use it to add a new bus line in your network. So, this is like, the line I have important, like I said, LA, the train, the bus, and the passenger transit. So, if I want to add a new, like let's create an agency that will like do an express line from the airport to here, and that's the only thing they do. So, basically you create a new agency, you will give them, ah? That's a good one. The next transport, I need to have a short name, let's do a color, something that is flashy. So, that's my agency, I have a new one. Then I need to add new line, I could like add new stops. I'll use the one that are existing on the map at this point, but basically I'll click here, add a new line, adding a new line, let's say this is the first line here, and so then the next thing you have to do is find mode, yeah. I need to tell it, let's do just a bus line, it goes simple, like the difference between selecting bus or train, train will just ignore your road network in bus, every time I click it will attach the bus line to the network. So, let's try to start somewhere near the airport. Yeah, here, here, okay, it's a bit tricky sometimes. It doesn't let me pick it. That's the fun part of doing a demo, and then let me pick it. So, I just click on it and it works. So, that's the fun of live demo. Okay, so let's skip that part. So, basically the idea I wanted to show you is to, I would have add that, so I would have create this line here, and if I click on it, let me check the path, and basically this is what I will get, so this is I would have pick a stop here, let's say I put a stop downtown and a stop here, and then I click like generate the reverse path, it will create the other path in there, and then when you go to routing, if I take the same route as before, I click here my scenario with the variant with a new line added, and just ask the route, and yes, it will take the most direct route that we just added there, and it's slightly faster than it used to be before. There's still some interconnection here to deal with, but so, yeah, so it will use what you're, it still use the other existing network, so if it can, so that was the quick demo in there, so it still works. Just a quick word, I mentioned it's coming from the research world. Some people sometimes say, oh yeah, research code is bad, that's true, but how do we make sure it works for real? The main thing we do is we partner with Actual Transit Agency, right now we are partnership with most Transit Agency around Montreal in the province of Quebec, so we actually develop the tool with them, so we basically real time write a code update, they have live platform that can use, the informers like oh yeah, we want to do that kind of process, that kind of study, and we meet with them regularly too, and they can ask them what kind of work do they do, and sometimes I will do that, and we export at that time, and we do that, and that's always like, what if you just give, can we give you maybe just a plug-in for QGIS, and you can just add the information directly instead of I export a CSV file, and then doing a bunch of stuff, and they're like oh yeah, that would be great, it's not done yet, but it's on the high priority list. The other thing that we, that the code quality with students, that's something difficult, like students will write a thesis, will write a paper, they want to do some quick, some prototype, it's sometimes hard to take the time to work with the open source project, so there's a lot of education to do with students to, okay, if you want your tool to be actually useful, and some of them are really excited by that, okay, this is how the open source process work, this is how to do a PR, and we know it's gonna take you some more time, but in the end, you're gonna have a better result, and try to push some research to try to do, and we are part of them like yes, if you shared paper, you shared data, but you need us to share the code, and this is, we have all the code there, so we try to push this idea in the research world that you should be doing that. We did a few things, before we opened the old code, we did a lot of cleanup around the, converting most of the JavaScript to TypeScript, we found a lot of unused variable, or we found variable that didn't exist anywhere that we use, some part was not going to clear why it worked. There's a lot of dependency creep, especially if you prototype a small thing, you end up import the dependency, like it's easier in PM and port, and then you have something, so we try to make sure we clear up some of the dependencies, and we have a few research professionals in the lab to do that. It's obviously an open source, this is the GitHub status, there's 450 issues still open, it's not all bugs, a lot of them are just new idea, but it's a start, it's a start, a work in progress, we're starting on it. There's a lot of stuff we want to do, like especially in the ease of use UI stuff, like just make things easier to do, a lot of performance improvement to do, like right now, I have imported all the data for early, I think I would have like put my tool a little bit to a stall, like Montreal is a bit the maximum size we have right now, but we're gonna do improvement as we go through. As for my initial problem, what I did, I did the accessibility map from my place, and did a calculation, a batch calculation for every five minute slot during the day, so from 9 a.m. to 9 p.m. for every slot of five minutes, and this is all the dot that we have here, so I did plot, what's the size of the accessibility area I have during the day, and you can see there's a few dips in there that kind of prove my theory of like, oh, there's a problem, and there's a big dip around 10.30 there that I reduced my accessibility size from like 75 square kilometer to about 65, and that's like more than 10% drop, and that's kind of significant, so I'm not a transit scientist, but I see there's a problem there and like something to investigate a bit more, so hopefully we'll prove that there's a problem and I can get better bus line in my area. Thank you. If you have any question, I think we have a couple of minutes. Yeah, sorry, I came in late. I didn't even know this was going on. Sorry, I'm losing my voice. So a question I had is so effective, and this is perhaps in room guard to my coming in late, is this all an attempt to improve what exists for the routes and infrastructure that exists only or do you have a component that looks at what exists, sees the sub-optimality of it and has the ability to either input or track or project costs that could be improved because in a lot of cases, particularly in America, less so in Europe and probably less so in Canada, that's a very big problem, particularly things like eminent domain and the rather spectacular costs that can be occurred, but ways to mitigate those. So I mentioned we have an optimization component. It doesn't do everything right now, especially automatically for planning and building, but if you can just put your existing network and we'll find optimization and then even remove route that doesn't useful or maybe just change route, like okay, you can convert these two route by one and it's gonna be as efficient as before. It's in the tool, right now it's only accessible in the CLI, we are working on adding a UI on top of that, but it's something that is in the core, needs improvement and needs maybe more intelligence into it, but it's something that is in there. The questions? Couple. On a similar theme, does the tool have functionality for looking at trade-offs, like we're taking away a lane of the road to put in train tracks? That's a really good questions, and there's always the question like, can you do the dedicated bus line and what's gonna be the impact? We don't do that, but that's a really good idea and patches are welcome. I think you'll be the last question. So I just found out about a cool project that one of the bus agencies in the Bay Area is doing with the California Department of Transportation and that is transit signal priority, where the bus generally gets a green light and cars that would be turning in front of the bus get a red light. Is that supported and if so, how much difference does it make? That's something that is in the tool and that would need to improve the routing algorithm. I don't know the detail, I know it does improve bus latency because the bunching of the bus behind and at the light, that's a big problem. I would have to ask my colleague, who are the transit professional, what's the actual impact? It's not something we have in the tool. I don't really consider that when you do the bus routing, how much time it has improved, but something definitely to put in the to do this. Yeah, so this will be a comment and then maybe we'll turn into a question, but I used to work for a transportation startup and I know related to the question, if you take away a lane in the road and you replace it with a train, like where are the pros, cons? And for our startup, we were looking at, for the tentative vehicle that we would have had, you're looking at the headway between the vehicles that you need to maintain, which ultimately feeds into the throughput. So I guess I walked in a little late, but is that something you could potentially add to your tool, looking at the headway of either the vehicles or trains and then ultimately being able to use that to compare throughputs. And when I say throughputs, there's vehicle throughputs and then there's also passengers on board, like total throughput, just things to consider. I don't know if there's... I think we have some of that. Obviously you can put different frequencies and different pacing of your vehicle. We don't do comparison between like, especially like car vehicle at this moment versus like buses or the capacity. We can do, especially when you do the simulation, we can do like load simulation and you know, like, okay, you need that, I mean, you know how many people per time per vehicle is gonna be in your vehicle, but I don't think we go as deep as the analysis that you are talking about. But yeah, we have a lot of ideas to improve and hopefully at some point get more people involved, especially if there's many transit agencies. The idea here is to, there's other commercial tool that do that, but obviously we prefer to do in open source and be able to collaborate with research and bring new ideas and get agency that have tool that really responds to their need and like do the analysis that people think or the people can invent new way to think about transit and we love to incorporate that and share that around. I know there are a few more questions, but we have 10 minutes to transition the room to the next speaker who's gonna be talking about scraping legislative data, which is a topic near and dear to my heart and she's coming from the Open States project. So we can take maybe, I think we have one or two questions but we really do need to transition the room. So you're good, is that, you want a question or are you waving me off? Okay. Otherwise I'm still here after the talks if you cannot answer yours. Yeah, so that is very interesting. I've, you know sometimes when I travel, right, I would use sites like Sigalert.com or maybe trafficpredict.com, right? And so I'm wondering like in terms of the, you know, the real time update, right, of traffic information, how often does this site get updated, like if I look right now, what is that based off of a snapshot of traffic like 15 minutes ago, an hour ago or what? Yeah, I think the actual like congestion information is really, really hard to do. It's really hard to consider. There's not a lot of open source of information about congestion. We have access to some with some partnership with like transportation agency and stuff but it's hard to consider, it's one of the weakness of like doing analysis based on schedules. You sometimes don't consider the congestion. One thing we might want to do is to use, especially where we have the GTFS real time information, compare that to schedule and then we know, okay, for this city the line number six is always off and do better analysis based on actual real time information versus the schedule one. But it's a difficult problem to consider. Last question. Yeah, so I actually have a lot of friends that are self-engineers at LA Metro. One of the main softwares that they use is a closed source software called Remix. And so I think they'd be quite interested in this. The primary thing that Remix allows is A, integration with very big batch data, census data and B, the cost estimation. If that feature doesn't exist, I might start working on your repo actually. Is that on your timeline or roadmap to integrate that kind of cost estimation for running the line? I think there's always some bits of it, but yeah, it definitely needs to be improved in there, but that's definitely something that could be interesting to have. We don't aim to be exact in cost estimate. As I said, there's other tools that does that better, but something, it's one thing that cities ask us, like, okay, how much is gonna cost us? And we're like, okay, and then we do some basic outside computation, but it's something that should be able to get the athletes in some way. But we can do like, bitch back, we do a lot of batch processing with the census information, so that part at least should be covered. Thank you very much. Thank you very much. Ted, as I said, in this room at the top of the hour, we're gonna have a conversation about scraping legislative information from as part of the Open States project. Got breathing, good. Still? Okay, cool. He's in it. Do we have to? All right, well, let's get started. Welcome to the housecape that is scraping legislative data as an open source project. For starters, just out of curiosity, how many people here have been on any government website? All right, awesome. Keep those hands up if you enjoyed the experience and found the data that you wanted. For you, that's impressive. Oh gosh. Yeah, I was gonna say, it definitely can be a struggle. So, while legislative data is nominally public, it does tend to be buried in opaque state government websites, open states attempts to load it all, normalize it, and make it easily searchable and enable legislative tracking across every state. That is one heck of a project to take on. My name is Riley Johnson. I am officially a software engineer with plural policy, but I do identify as more of a code gremlin. I do tend to break things further before actually figuring out a fix for them. I am, we're on the data engineering team at plural, which is formerly Civic Eagle, and also this is my first time speaking at a public conference, so please bear with my anxiety. Thank you so much. Awesome. Just a disclaimer, I do adore my job. I love trying to figure out ways to get all of this data. I am not complaining, just highlighting the strangeness that we've encountered, and just some of those weird things for state sites. Originally, open states was founded at PyCon in 2009. It was the goal to standardized clean and publish all of the legislative data we could get to the public to improve civic engagement and help people understand the legislative process that creates public policy. Occasionally, also would contract with some of the state legislatures in order to standardized data and improve access, but very few states were actually interested in improving that access, and very few projects have actually made it to completion. Originally, it was part of sunlight labs until 2016, when it fell mainly on the shoulders of one of the founders, James Turk, who maintained it by solo, mostly from 2016 to 2021, when he brought the project under the plural umbrella. I got trained in as number two, and now we actually do have an entire engineering team, which is very nice, because there's a lot of weird stuff. So, Open States has scrapers for all different data types, bills, legislators, committees, events, votes, across 50 states, federal, DC, Puerto Rico, and Virgin Islands. Typically, we run about over 100 a day, some at least once a day, others every couple of hours, but as you can imagine, we run into some weird kind of things. Typically, I think you don't even need to be on these websites to know that state legislation, federal legislation, is pretty slow to evolve, even with all of the different marvelous technology options that we have nowadays. Standardizing that data across state borders is absolutely insane. Fortunately, I got into the project late, so I didn't have to do a whole lot of that, but there's still a lot that we still have yet to figure out, especially when we work on new data types. Even with, processes can vary, even with how bill versions get published. For those of you who aren't familiar, bills try to edit standing law, and bill versions can differ because state legislators have different texts that they wanna add to it, things like that, they do tend to mark those up and change that edit. There is one state in particular that didn't have that digitized. They actually have all the bill versions printed out at a specific printer, put on a truck, driven to the state capital two hours away, marked up by hand by the legislators, and then driven back to the printer so that they can actually print out the new version. Very archaic, still to this day, probably still running. But fortunately, they have actually started adding bill text on their website, so normal people can access it. Another kind of weird bit, there is one state in particular where we don't have access to the vote data. They just haven't thought that it was worth it to post, like how legislators vote on specific bits of legislation, but there is one company in that state that does have that information, and they just charge whatever they want for it. So we're pretty sure that they just pay somebody to sit in there and actually record them by hand, but we haven't gotten to that point, so we'll just acknowledge that we don't have it for now. But it is kind of a weird bit. But working on this kind of stuff, one fact that I've definitely come to terms with is that legislative websites suck. Even when they don't suck, they're bad. It's fairly common to be really hard to find information as a user, let alone find scrapable pages. Some websites are just interesting where you start with a new fancy landing page, and then the deeper you get into the website, it just becomes older and older versions of the internet where you're like, how is this still running? But it is running, so they don't actually change it. So you can actually see some state websites where they definitely have web pages that were designed in web one. But it is interesting to get into it because there's all different kinds of little, like when you're scraping it, you get all kinds of interesting console messages from different legislative developers, and we do tend to try into our best to find the most accurate and up-to-date sources for a lot of our data, but even then sometimes we'll find out after we've built the scraper that it just isn't updated as regularly or whatnot, so we'll have users complain to us about bills being late and things like that, but we just had no idea because we scraped from the wrong spot. Also sometimes legislative websites aren't the most secure. I don't know if anyone's that surprised by it, but there was a case where one state had a SQL injection of not-safe-for-work content on their legislator table, and we raised it as an issue with them to be like, you guys might wanna take that down and upgrade your security, which they did, but then they also accused us of doing the injection, so we've learned from that experience. Sometimes even just getting access is a trial and error challenge. With data impacting the public, you'd think it'd be easy to get all-edged as, not the case. Some state websites have implemented Cloudflare, so normal scrapers can't actually get any of the data, so we've had to find ways around that. Sometimes even with just how states have started adding this kind of data to their sites, there isn't a super clear path to get to it, even if it is on a webpage. In one case in particular, there was one state where their website wasn't super clear, it wasn't a great way to go about scraping it, and apparently the right person complained to the right other person at a conference, and we actually ended up with a napkin, with a address, and a username and password for a read-only account on their server. So that is framed in someone's office right now, but we've definitely had to come up with interesting ways to get to this stuff. Sometimes we can reach out directly to the legislative office tech team and get access or whitelisted or whatever in order to get this data, but not all of them like us or are particularly responsive, so we've had to find some interesting ways. There is another state in particular that just has a website that just has some garbage webpages, but we've figured out a way to forge calls to their not-so-public APIs, so it's fine, we get it somehow. Another thing that I've come to learn about this kind of getting into this stuff is that just anomalies occur. There's just some things where I just have absolutely no idea what's happening or why they decided to go about this, but we have a custom-built task runner that creates warnings every time that a script fails five times in a row, because most of the time it's self-feeling, there's just weird anomalies that'll happen, but most of the time somebody internally fixes it, so it'll run the next time around. But there was particularly one case where the Office of Legislative Services kind of fell off the face of the earth. The website of the office itself was still live, but the application that we were using to scrape data and that they used to publish any information about the bills that they had completely disappeared, fell off the face of the earth. I didn't know what to do, so I ended up, I emailed and kept getting delivery and complete errors, like one after another tried for like two, I think Gmail tried for like two weeks straight to try and deliver this email, and it didn't work, so I had to get over my millennial fear of phones and call. No one answered, so to my joy, but also confusion. And I had to search the internet to see if this was still a funded place, if this was gonna close, there was absolutely nothing. But lo and behold, the app came back several weeks later, like everything was totally fine, and it's just something I had to accept. And sometimes another factor that I've had to explain or to get to is that New Tech is not always better. There's definitely been a lot of new websites that have come out in the time that I've been working on Open States, and clean websites are really nice to scrape, and APIs are also the dream, really, to get all of the data that we need. But sometimes dreams are really fragile, and they can't actually make it through a successful scrape, even if there are only like five things on that API. So having, I've raised a couple issues with those state legislatures on their stability of the API. Sometimes they get back to me, sometimes they don't. Sometimes it improves, sometimes it doesn't. And occasionally, sometimes they add an API, and it might be a little bit like too good. For instance, one added an API while we were creating new legislators scrapers, and we got a lot of the information that we are requiring for that. Like easily got access to their name, their district that they represent, email, office information, addresses, voicemail, stuff like that. Also got some benefits like occupation, but then also got into things like home address, spouse information, license plate, and occasionally a social security number. I wish I was joking, but that was definitely something that was showing up on their API. I don't know why it would be on that part of it. It should be behind HR, but that's on them. We did raise that as an issue. Fortunately, this time we weren't accused of doing anything with the information, but it was just an interesting case for potential legislator identity theft. But so these are some of the issues that we've run into with getting all this stuff, and we do make it work most of the time. We do have some bugs and broken things, like currently we're a little bit behind on build data from Iowa. We're getting throttled by them, and I don't know what's up, but we do get a very comprehensive list, or data set for the public. That being said, we do have a lot of different touch points for people to get access to this data. So we have openstates.org, which is a public-facing website where anybody can track and find information about their legislators, bill information, things like that. We do have two APIs. One of them is about to be depreciated, so I'm very excited about that one. And we do also have all of this code on GitHub, so it is open source. Anybody can contribute, anybody can work on our code, but since it is open source, we also know that since it is forkable, some of our competitors do use our code, and at least one contributes back regularly, so that's okay with us. We do also have access to this data for bulk CSV or JSON downloads, and then also nearly complete Postgres data dumps, so however you wanna get the data, there are a bunch of different ways to do that. And also being an open source project, we have a lot of different ways to communicate. So we've got the contact email for those who wanna go that route. We do have a Slack and a Matrix instance for people who wanna use more of that messenger kind of thing. We do have Twitter for people who are still on Twitter and want to tweet at us. And for those who are technically literate, we do have a GitHub issues repo where we have a Q and A discussion section and then also templates that allow you to post issues that you're seeing on the website. As expected though, templates do tend to get ignored a little bit. And we've also got some pretty thorough documentation. We have an air table for new session information that is updated as regularly as I can, especially when people care about new legislators in the state and things like that, making sure that people know when things are gonna be scraped, when things are gonna be accurate, stuff like that. We try to keep quarterly blog posts going so that people know exactly what's going on with the project and where we're going and who's working on it. And then also we have our own little summit that we've been trying to do more often. But since we have all of these different ways to communicate, we get all kinds of different kinds, all different types of communication issues. So one of my favorites and one of the most common is something like this. This is a ticket that just says district, question mark and there's no content. This is very common. I get a lot of emails that are just either one word or wrong legislator and it's like, I can't do anything to help you but thank you for reaching out. And sometimes they do get a little bit more specific. Here's an issue about specific circumstances which gives us a little bit more detail but I don't know if they're talking about an action or a bill text or there's nothing about even what jurisdiction I should start looking in. So incredibly helpful. Occasionally we do get, this is a whole bunch of texts that is actually meant for members of the legislator or the legislation and about gun rights and stuff like that. Just because we aggregate the data doesn't mean that we actually have impact on it. But it is what it is. And then we also tend to get a lot of questions like this one where it's about how bills actually impact daily life in one way or another. These actually kind of break my heart to get because I do wanna be able to answer them but I have no idea. We do just have to kind of remind them we just collect the information and we don't have the expertise to actually answer any of these and they should be reaching out to other people at state level or wherever to get that kind of information. But it is an interesting little bit. Last, like the last thing that I wanna share that I've definitely learned about doing all of this is that it is absolutely worth all of the chaos and like little road bumps. Truly like doing this makes a difference for a lot of people and definitely provides the opportunity to see more clearly what's happening with the legislation. So I couldn't imagine doing anything else. If any of these things are interesting to you or like seem like problems that you'd like to help solve please help us. We could definitely use more contributors and we do have a running backlog of issues that are in different varieties to take on. We do use fancy new Python stuff and try to cave everything up dated as much as possible but could always use more help and I went through this way faster than I thought I would but I hope you enjoyed and this is my contact information and now we're ready for questions. So a couple of questions. I assume you're using Python and you're pretty much developing whatever framework you're developing yourselves. And it's not based off a beautiful soup or something like that. It's I think based on Scribblib. Okay. But we do we've got a couple of different types of all of our Scribbers are written in Python. They are using different frameworks though. So again maybe a naive question. Did you ever look at the combination of nuts and solar and then post processing that data? Because you know you can quickly evolve what you're looking for what you get back and how you can structure it. It may not work for legislative bills. I'm not really sure but it's it's pretty general purpose as long as you're looking for specific things you can tune it you know when you know trial and error very quickly and get you know you're familiar with nuts and solar and you can also if for some reason you see you know an easy opening to enhance it. You know obviously it's open source it's an Apache project or set of Apache projects. Obviously people know solar with elastic search and Lucene are the two bases for that commercial product. So yeah I'd be interested maybe afterwards talk and find out how you're doing things and maybe throw some things in there. Yeah I definitely was not around for those kind of talks but would be interested to hear about that and also I will be doing one of the up-up-scaled talks later about the scraping library that we are currently trying to use. So well promo for that talk. This is really interesting and I'm looking forward to when you replicate this at the county level. Ha ha ha. Perfect. How many counties are there in the United States? Could you give us just a little more information about what kinds of stuff you're scraping? I mean is it just bill text? Is it bill reports? Is it testimony? Is it budget data? Is it all of the above? That's my first question. And then my second question. Do you guys have any kind of relationship with the National Conference of State Legislatures? I'm gonna answer the second one first. I think we do also just because I don't want to forget what it was. And then first question, yes for the most part we mostly most of our information is mostly bills and resolutions. We do have some kind of like stuff around fiscal notes and stuff like that. At the legislation level it's most of the time we just try to get like the basics of it. So like the title, different actions on it about data versions. We don't actually process for bill text. That has definitely been something that we're interested in including on open states since it is kind of a huge thing that actually or like people need access to. We only provide the links to it. But then yeah so we've got most of that kind of stuff for bill data. We have information on the legislators. We also do have a find your legislator application so you can put in your address and find who represents you at state and federal level. It might not be entirely accurate right now because redistricting is kind of screwed with us. But yeah we've got stuff like that. Committee data events, hearing events and things like that and how they relate back to bills. But good question. How many people are on the project and how is it funded? So currently I think we have like six people on the engineering team that can talk to us on any one point. I do most of the daily operations now. We are backed by a VC company. So we do have a bit of money in the back so we don't have to worry about it too much anymore. That was part of why it got brought to Plural in the first place was it's not sustainable to do just for one person without funding. There we go, I'll repeat that. Jesse, the CTO of Plural is giving a talk on that tomorrow morning so if you are interested in more about that kind of stuff, stay tuned. I have a quick question. Have you partnered with any, if you're getting all these questions on your site about people who are wondering, have legal questions or whatever, have you partnered with like the EFF or anything like that so you can just be like, here, we'll just pass you right along to this public, you know, a Southern Poverty Law Center or whatever. She's like, here, we'll just pass you along to these guys and they know what to do. No, but we definitely should. That would be really helpful. This, Plural is for profit, yeah. Do you have or do you have any plans to add functionality to send like email notifications on my specific legislators at the state level whenever they vote on things like GovTrack does at the federal level? Yeah, not on vote data. We do have tracking on bills so you can like get know where bills are at and if there's action on those but not on specific votes or legislators themselves. That would be a good one. Any other questions? Oh, oh yeah, I thought you were just going to go. We got time. You might get the sense I work in government which is why I'm like totally fascinated and I am so sorry I didn't even know this existed so I'm glad I'm here. I can think of a million reasons to answer each of the issues that you raised like why those things happen from underfunding of technology at the government level to legacy systems to all of the stuff but I am curious if y'all ever try to do anything like rank states in terms of who's more transparent and who's less in terms of making data available and God, I just have to ask do you see a difference between blue states and red states? Okay. I will not be answering question two first this time. Let's see. So for the first question, yes we actually did historically have a map of the states that gave exactly like a ranking grade to each of the states on like how much data we had and how what the coverage was that has kind of fallen by the wayside. I'm actually not entirely sure where that code is anymore but that is actually one thing that I've been pushing for with this year of actually having time and interest where people want to know exactly how much data we have in each of the different areas. So that will be coming hopefully this year. As to the second question, I will say yes I do notice the difference. I'm not gonna get into specifics. Yeah, let me think about how to phrase this question. As far as are there any resources for legislative bodies for how to like if they want to make their data more accessible and transparent, does plural offer any like white papers or documentation if states want to go that route and do you guys think that there's even, if it's worth sort of articulating a standard for how to structure legislative data? Yeah, that is a good question. I don't recall there. Like I think there is like with the historical stuff about doing that contracting work I think there is guidelines that we do have around like what would make this kind of stuff more accessible, but it is some specific on state per state. I don't think that there is a ruling body that would be able to make all of the states obey and get all of this stuff out, but it would be an interesting option. Hi, have you noticed a grouping of are there certain software vendors that you see over and over again providing state legislative data? I'm looking at the city and county data and you know, Granicus was the 800 pound gorilla, but then they started buying their smaller competitors during the pandemic. And so now they're probably 95% of the market in that case. Do you see that at the state level or is it just kind of a dog's breath of different vendors? I mean, we do have quite a few competitors that do have that kind of information at state and federal level. It doesn't really seem to be a whole lot and they do tend to buy up more of the smaller organizations that like get more state specific. I know that there's a bunch of state specific trackers that are available where it's just like very state specific, but not, there's not that whole much. What's your business model? Tomorrow talk, I don't know that. I just do the code monkey bits. They're questions? Cool, I guess I have good reflexes. Hi, I actually do something similar to this so I'm gonna talk to you after this but with different sorts of data. But my question, cause this is something that drives me crazy is how do you keep track of all a hundred and something of your scrapers? Like what do you do to orchestrate that? As I said, we have a custom built task runner. It's definitely like something that I don't blame anybody for glossing over and hearing. It doesn't work super well, but it does do the thing. And yeah, so that keeps track of like most of the latest runs in the last like, I think it's like 20 times that it's run and that's kind of a nice way to, it's very clear coded on like green if it actually ran or red if it failed. So nifty, yeah. Apache Airflow. Oh, well, I mean, yeah, we're trying. We'll get there. Again, I meant to get ahead of myself before tomorrow's talk, she won't be coming tomorrow, but so you're sort of owned by a private company but your data is public, you're gonna continue to operate publicly. There's no plan to put a wall around it or charge for it, although maybe your funding source probably gets priority access or say in development decisions. Is that sort of correct? Yeah, actually, and I can get a little bit more into the background of that because I actually started off as an engineer for Plurl. It is, so we ingested a lot of open states data originally and we, I like to call it kind of Google Docs but for legislation, so we just like use all of the standardized data and then make it nice and friendly so people can collaborate on it. And that's how I kind of got trained in because like I was working on adapters to get that like the state code into that project and then got into open states and contributing back to make sure that like data was accurate through that. And so currently, yes, we are planning on keeping open states public access and everybody can keep doing that kind of stuff. We do have further functionality and like stuff like bill text behind a sort of paywall. We do have a premium version but still in progress on that. But yes, we will keep this free for forever. Yeah, yeah, of course. Any other questions? One in the back, one sec. Getting your steps in. 10,000, right? Earlier you mentioned that sometimes web scrapers in this go down, you don't necessarily know why in the instance of an outage, I guess in like a traditional setting, you have an outage, you let your stakeholders know you have an outage. Who is your stakeholder and how do you let them know? Good question. Most of the time we don't. It's definitely like if they notice, they notice but there's so much going on where we can't really tell like who cares about which specific thing that it's kind of hard to know with open states like exactly who to warn about these kinds of things. We do try to get ahead of the curve as much as possible. Like I mentioned with that like find your legislator application, I tried to be as upfront as possible with people about like expectations about when that data was going to be updated, like how that was going to be completed. Definitely didn't anticipate it to be as much of a problem with redistricting, but we do the best we can on like that kind of stuff. And that application is a majority of traffic to our website, so that's the one where I'm trying to warn people about. All right, now any questions? Ah, one more. That's fine, I'm at line 9,800. It's cool, we'll be close. Are there similar projects or competitors? Yes, or no? Not that I have the open data side of it. Yeah, also California's not as good as you think. You just mentioned redistricting a few times. Is there a, I mean, I imagine this has a whole other layer of complexity to what you're developing and what you have running here in production, but is there a plan to maybe have pre versus post redistrict data and see how that compares or just for now just say, hey, we're gonna do only current legislation with the current district thing and assembly? Yeah, we do, since we only have like the bandwidth for like maintaining one source of shapefiles, we do try to be as explicit like that we are doing current representatives in current districts, which has been a problem, but we definitely have interest in both sets, so would be good to keep. Hi, thanks for your work. I did have a question, so I'm looking into plural as well as open states and I see it, the focus is more on legislative information and in terms of other facets of government or other departments, let's say like DMV for example or other kind of areas of government, right? There's a lot of different sections. Do you guys have plans to move out of policy into different sectors or are there any other like similar projects that you're aware of that are tackling different areas of government? There are definitely other projects that cover other aspects of it, like what is it? Like know who, like more of the like, we've gotten a lot of questions about like if we're gonna have stuff around campaign funds and things like that and there's other projects that already do that kind of stuff. We are probably going to be exploring different stuff outside of legislation, probably not under open states itself, but we are interested in other types of our branching out beyond just legislative. Good question. And it's in this room. No one can ever leave. Hi, so if your code is getting data and normalizing it, how do you, I guess if you make a change in your code, how do you know that it results in cleaner data coming back from wherever you're scraping it from? Like what metrics are you using? How are you getting a sense of that? Yeah, we do have some insights like into exactly like what kind of data is coming back. It is kind of hard to figure out like since there's so much happening and like like any scraper can return like thousands of bills. It is kind of hard to like get into the checking of that. Most of the time we kind of have to rely on like people flagging stuff on the open states out of things in the other application plural. We do have a dashboard that's around like how many bills have versions and like all that kind of stuff. And we can get a little bit more into my new details and feed that kind of bug fix or like things that happen on that side into open states and fix it upstream. So it's kind of cool that way. All right, well, I think we're done with questions. Thank you very much. Thank you. And for all of you who are asking how you can contribute and how you can do stuff like this, our next talk in this room in 19 minutes is how developers can get involved with civil at the open government. Yeah, or it's almost 5 p.m. Well, hi, my name is Margaret Tucker. I am a policy manager at GitHub. And I guess I should start by thanking you all for being here at 5 p.m. on a Friday. It's really rainy, I'm just, yeah. Thanks all for sitting in. And so I'm here to talk about how developers can get involved in public policy. The talk that went before me, Riley, was fantastic. And I could tell from the audience questions that a lot of you are pretty versed in how developers can get involved in public policy. You might be developers who are actively engaged in this. So I hope this isn't too basic of a talk, but I'm hoping that in the Q&A, maybe people can talk about their own projects and we can kind of get into some specific regulatory developments that are relative to developers. So first off, I'm gonna explain how policy advocacy works at GitHub. Obviously we started off as a startup, a lot bigger now, 100 million developers. We have a policy team. So I can kind of explain how we do our own policy advocacy strategy. And then I'll talk about some policy developments relevant to developers that you may have heard of and be engaged in, maybe you have questions about. And then we can get into how developers can get involved in policy. I'll feature some organizations that I'm familiar with, but again, this is not a comprehensive list. I encourage anyone who's engaged in something to share it, connect after it, I love that. So yeah, so let's get started. So I guess our policy strategy is centered around the idea that developers are important policy stakeholders. So GitHub was acquired by Microsoft because we have a huge amount of developers who have really important interests and it's important for us to protect developer interests, maintain developer trust. And so our policy advocacy is both internal, making sure that what we're doing at our company aligns with developer interests, listening to developers and trying to, when we mess up, working from there. And then also advocating specifically for certain policy developments that we want or we don't want and just making sure that the developer voice is heard from policymakers. And I should say honestly, I think that we are one of the kind of largest developer voices in policy debates just because we're a large and resource company and we are able to meet with policymakers. But that said, we also have an open policy repo where people can get in touch with us, where I check it, our Twitter, we check. So we're really like, I just want to put out there that we kind of try to both open source and engage with community when we have anything going on in the policy team. I guess this is our little, you know, mission statement. One thing I should say is our director of policy, Mike Lynx-Fair, he's a developer. He was really involved in Creative Commons and is an active Wikipedia. I'm not a developer, but I am a geographer by trade or by education and worked in OpenStreetMap and Wikipedia. I came over from GitHub from the Office of Science and Technology Policy, really weird year because it was pandemic and then the election transition it was a really weird time to be in Science and Technology Policy. And I've also worked for Slate. So we have a really big mix in the room on our team. We also have a staff software engineer. That's kind of a funny story and I will get into why we now have a software engineer. Part of the reason you might have heard of a thing called YouTube DL, anybody? We'll get into that, we'll get into that. Yeah, thanks. So yeah, let's get started. So there is really notable increasing policy maker interests in AI, ever heard of it. Open source security and platform liability. So these are just a sampling of I would say some of the more relevant pieces that we're following right now. So there's the EU AI Act and then Canada also has an AI and data act. On the open source security side, really it seems like every country wants to take action on open source security and we're trying to make sure that open source interests, especially the people who spend their voluntary time working on software represented. There's also just last week I think that the White House announced a US national cybersecurity strategy. There's actually some pieces they really got right about that so we can talk about that. And then there was the securing open source act of 2022 that was introduced last year but it probably will be reintroduced. So look forward to that. And then on the platform liability side, there was the US Supreme Court. I'm sorry that weeks are kind of blurry with all of this but the end of February they heard oral arguments in the Gonzalez v. Google case. It's the first time that section 230 of the Communications Decency Act has come before the Supreme Court. I don't know if you've all heard the Jeff Cossup phrase the 26 words that created the internet. Is that familiar? So, or maybe have you heard the kind of the rumor that Al Gore says he created the internet? Anyone heard that? Well, he took a part in drafting the legislation that did enable the internet to come about. So platform liability is a big piece of that. I work a lot on platform liability and while GitHub is not the most, typically these things are taking aim at social media. GitHub is a platform, platforms that are way smaller than us are also impacted by platform liability legislation. And so typically with that sort of advocacy, we're explaining why things should be tailored to risk profile, why things should focus on the actual things that say amplify speech or get really specific about what you mean when you want to dive in or narrow 230. And yeah, so also I should mention the EU Digital Markets Act and Digital Services Act, both of these have passed. And one thing to note about these, I was listening in on some hearings last week at the Senate Judiciary Committee and they were talking about, oh man, the EU passed the Digital Markets Act, why can't we pass more anti-trust? Oh man, the EU's passed the Digital Services Act, why can't we have anything going on in the US side? So I'm definitely noticing a current of US policymakers looking to the EU and saying, hey, they can do it, why can't we? So I think that also is kind of a general trend and we'll see how that is. So yeah, and also again, I'm sure all of you are very familiar with this legislation or not, but would love comments on any of those developments. So I'd say that one of the pieces of kind of a policy advocacy that I think Github has done the most for and I'm really proud of is how we've advocated for co-collaboration on the copyright side. Are any of you familiar with the EU copyright directive? No one? Okay, well I can tell you that story then. So this was in, I think 2019, it was before my time at Github, but the EU introduced the copyright directive that would have introduced filtering and obviously that's really concerning when it comes to open source code. Obviously code could be flagged, it could create a lot of issues in the software supply chain and so we were able to get developers to provide comments and why mandatory filters would make sense on co-collaboration and further work and we were able to gain an exemption for open source and co-collaboration. So that was just one example. I think sometimes people, I obviously don't wanna sound overly optimistic but I think oftentimes when it comes to tech and science policy, we don't have as, the partisan lines aren't drawn as much and I would say that policy makers are really open to kind of technical expertise and when it comes to developers explaining hey, you don't know how this works but this is how this development would impact me is really important. Also again on the liability side, smaller companies, organizations, independent developers, all of those people, you might not have the resources to advocate for yourself because you're busy doing your work but policy makers do wanna hear from you so I'll get into some ways that channels where if you have a comment, you can provide that and let people whose job it is to amplify it, say something but yeah so a couple other updates on the copyright side so that technical measures, filters, all of that has been kind of a big push so we also participated in copyright consultations with the US Copyright Office this past year, we'll see what happens from there, these were kind of in relation to the Smart Copyright Act and it hasn't moved in a while so we'll see but there is a lot of push in this kind of idea that more of the internet should be scanned and filtered and we're trying to explain why it makes no sense for that to be applied to code for a host of reasons that I'm sure you're familiar with and then this last one is just pointing out, we filed an amicus brief recently with the Ute case so this relates to YouTube DL, we have a blog post on GitHub has a blog and we have a policy section, I encourage you to read it, our staff software engineer Kevin Chu was very involved in that and I guess I could explain YouTube DL a little bit when YouTube DL was subject to a DMCA takedown, we didn't have the kind of expertise to review these claims and so it did get taken down and it caused a lot of downstream impacts and that's the thing of open source code is it's used in, it's like 97% of code bases and so one takedown can have really massive unintended consequences and so after that happened and after we kind of changed our policies in response to that, we brought on a staff software engineer to review our DMCA claims so kind of have that understanding, he's also a lawyer, he's really awesome, I wish he was here, but he's not and yeah, so we can move on to this next one. Have any of you heard of the EU Cyber Resilience Act? Does that come up? Okay, this is a big one and it's new but it is a bit concerning so the EU introduced this proposal for a Cyber Resilience Act and that was starting, I think in May or so there was an initial consultation and it's really focused on kind of the single market of digital products and trying to assign responsibility for cyber security of those products. Now this one is, it's a complicated one and I have to be honest with you, I'm not personally as engaged with it, this one has gone to my boss, Mike Lynx Fair, so it's a very complicated one and we're really trying to get the open source community more informed about it so I encourage you this one, we're trying to get more of an explainer and more information out but yeah, so I'll kind of get into the details that the EU Cyber Resilience Act and also some opportunities for you to provide comment if you're interested. So yeah, so this one is particularly relevant to the open source community because there's a lot of growing policy maker interests in open source security and I just wanna say, this isn't a bad thing. The open source community needs policy makers to champion open source as a driver of innovation and also frankly, the open source community needs policy makers to play a supportive role in securing open source. So there's a lot of oh, open source, we need to do this but it's like okay, let's provide resources, let's provide support, let's take some of the responsibility onto the end users of this or not the end users, people who supply products that are used by end users. So in the case of the EU Cyber Resilience Act, we've been engaged in the consultation process, providing recommendations on how to improve the proposal so it reflects the realities of software development, especially the open source ecosystem. So some of the comments that we provided and have pushed is that open source software is made available for anyone to benefit from always with disclaimers of warranty and liability and developers work hard to make secure code and should be supported in doing so but it has to be businesses that integrate open source into their products who are ultimately responsible for the security of their products. And this again, a lot of things in the tech world boils down to liability. So if there is a vulnerability, whose responsibility is it to patch it? And we're trying to emphasize that the largest most resource organization should have, that's where the responsibility should lie. And so public policy can either break the economic model of contributing to open source or it can incentivize it. So it can incentivize businesses, building products to support developers and contribute upstream. So carrots and sticks, like what do we want? And I think some concerns we've had with the CRA is there's a lot of sticks, there's not enough carrots. So that's kind of a concern. One development that we think has gotten a lot right is the US cyber security strategy. I put a clip, you probably can see it better than I can but this is just a screenshot of two points that I think they really get right of rebalancing responsibility and realigning incentives. And so I encourage you to read that. We'll see how that moves forward. There's been a lot of increasing White House interest in open source security. And I have to say I'm optimistic. I think that there's been a decent amount of integration trying to hear from the community. And yeah, so we'll see how all of that boils out. That said, I will say I am optimistic also with the Cyber Resilience Act because of the copyright directive. I think in general the EU is supportive of open source and maybe doesn't understand it as much but we can provide them the tools and provide them the information to explain that. Oh, and then this one, this is more of a happy story for the most part but on the global availability side, something that we've really focused on on the policy team is trying to call for global availability, immigration reform. And one example recently is that, so there was this announcement recently, I think it was in September that the US extended the general licensing expansion for online services in Iran. But GitHub was already available in Iran because we went through this whole process to get into licensing expansion. We're also working on ones in, I believe it's Syria and Crimea. And so this one, we're kind of, this is more of, I would say like within industry of trying to push for companies to get licensing expansions especially when they provide internet infrastructure and digital services that are really important for movement, it's important for just people functioning. So yeah, so that's a big one as well. And let's see, okay. So I guess we're moving into the third part of this talk which is how can developers get involved in public policy? So there's a lot of issues, a lot of things that are specific to developers and specific to developer interests and clearly policy makers want to do something but they don't necessarily have the technical expertise to do it. So there's a couple things. So you could work with the public sector, you could run for office, maybe start with getting involved at the local level, maybe become Audrey Tang who's a really cool open source developer and Taiwan's minister of digital affairs but you could also support public sector open source initiatives. There's a couple, I have them on here. I'm sure there's more in this room of people who are working on really cool stuff but yeah, there's fellowships, hackathons, public service, public private partnerships. If you care about the direction of tech policy, if you want to see something move then be the change, get involved, use your skills to make a difference. A really cool example, I apologize, I realize this text is way too small to see but in the right corner, the sovereign tech fund, that was a really cool advancement that we've seen. It's the German sovereign tech fund and it is funding, it's not just for security but it's funding that's been put in place by the German government to secure open source and it's really exciting. We want to see more of that. Even for like, I would say smaller package registries, there's a lot of more kind of like, I would say community based like efforts that should be funded. Couple other examples, Code for America, US Digital Core, Tech Congress is really cool. That one is a fellowship on the hill that is specifically for technologists. So there's a lot of examples. Obviously, I think that we kind of get focused on our own careers but you can kind of think of, what are my skills, how can they contribute to what I want to see in the world and think about that. So yeah, lots of different ways to get involved. One that I'm particularly passionate about and I'm sure that a lot of you are involved, oh gosh, the organization, ignore that. But I encourage people to participate in ecosystem stewardship. These are just a couple examples of how many open source organizations are creating their own governance, trying to figure things out. And then also with that, it's not just stewardship of your own ecosystem but also advocating. So organizing and advocating for what you want to see. So just a couple examples. I did a lot with the OpenStreetMap when I was starting out and they have a really interesting organizing body so it's really cool. OpenForm Europe is a very, they are very organized and frankly I would love to see something like that in the US and more across the world. They're pretty cool and you know, I have Linux foundation on there but OpenSSF is a huge one. But yeah, these are just a small example of organizations. And yeah, even if it's like, say like contributing more on the maintainer side, just I think that sometimes we just consume things and we don't necessarily, maybe you guys, you're at this conference so you probably are much more involved in the governance of open source. But I think encouraging people to see themselves less as a consumer of open source than more of a contributor and maintainer of the broader ecosystem is really important. And so I'm almost through. Another one is being a policy champion at your organization. So even if you're say, not in a policy role, frankly I think that sometimes people who aren't in policy roles have much better feedback, they can point out things. I have really interesting talks with all the staff engineers at GitHub. So share how policies impact to you and encourage organizations to take a stand. It could be on things about tech and also could be on like other political issues. Sometimes I think that people see, if it's an organization or a corporation, there's a little bit of cynicism when they take a stand but it does matter, especially the people within the organization. So don't think that people aren't listening. And then also encourage open source within your organization, contribute to the advancement and stewardship of the open source community. Say you work at a company and you can say, hey, can I spend Fridays working on open source projects? Something like that. Think about how you can bring open source into your organization and also lead it. And then finally, offer your skills for public good. This is a big one. So you can advance causes that matter to you, volunteer, skilled-based volunteering. I'm sure there's a lot of really interesting projects that you can get involved in. GitHub is a social impact website that highlights those but also there's so many others. And even something that's not related to tech, I'd say that most things have a need for engineering and development capabilities. And yeah, and then finally get in touch with us. I say this honestly. I run the GitHub policy repo. You can get in touch there. We read the, there's spam sometimes but for the most part we do read things and I do encourage people to get in touch. If you have thoughts, we also have an email but the repo is really the best way. I guess if you're interested in specifically like changes at GitHub, we also have a 30 day notice and comment period when we're making changes to our site policy. And I honestly, I do think it's a cool thing. I wish that more companies did that. And we do, even with those changes, we do try to internalize them and provide people with explanation for why we're keeping something or why we're changing something in response to feedback. You can follow our Twitter as long as Twitter exists. And it's at GitHub policy. You can also read our blog. Most of our Twitter to be honest is just articles from our blog. And we do a lot of, I would say kind of like explainers. And then when we say like participate in amicus brief or attend like a consultation or something, we explain like what we did, what we said, like how we're showing up for developers there. And then also just share how policy impacts you in any way. And that's both within GitHub and not. I also will say this is not on the list but we do have, there's a blog post that kind of outlines this but GitHub has had a lot more focus on data and policy partnership. So if you have a request for using GitHub data, you can also get in touch with us about that. There's a blog post. We mainly work of like academic organizations but yeah, we're trying to provide more. An exciting thing was on the OECD like innovation index, contributions on GitHub are now used as like kind of a innovation index signifier. So I think that's cool. Yeah, so I guess that, I think it's just Q and I now. So I get it's five p.m. Friday. If you guys wanna go, no worries. But if you have any questions about some of the policy developments I got into, anything that you want to promote, like I'm not anything but anything related to policy that you would like to share, I encourage that. And so I have this mic so if you want to say something you'll say it and then I will repeat the question and then answer it. So yeah, any questions or comments? The blue hat? Yeah, okay and I will repeat the question. Hopefully I get most of it right. So there was an instance where someone who was the maintainer of a project that was used by the US Geological Survey. Oh, okay, gotcha, gotcha. To be honest, I don't know that specific incident. I encourage you to share in our policy repo so I can educate myself. But I know that Gallic Mallware is a big one. We're participating, Github is, MPM is like within Github. And so like MPM is like the largest JavaScript library. And so with that we've been trying to participate in kind of these like ecosystem stewardship shares and generally zero tolerance on Mallware is something that most people can get behind. I don't know if I have any more specifics I can provide. Okay, I'm interested in that. Yeah, I think honestly like when it comes to open source and security I think the big one is like the solar winds and oh gosh, what was that? Log4j, or is it shell4j? Yeah, so both of those are log4shelf, log4shelf, sorry guys, but both of those are typically like the kind of like the first things that come up and just awareness that so much of our digital infrastructure relies on open source. And so yeah, with those conversations we're like, hey, like let's focus on the people with the resources to address this and not the people who are independently contributing to open source projects. Other questions? For comments? Yes. And so the question was, you know, what do I, you know, I don't know. So what do I wish that software engineers were more aware of or more involved in? I wouldn't say it's necessarily a specific topic but just the ways to provide comments. You know, there's a host of consultations, calls for proposals, like all these different opportunities to get in touch with policymakers. Because you can't just tweet at them, they're not going to read it, they're not on Twitter or whatever. They, you know, they're civically a portal where you have to provide comment and then that is all synthesized by people like when I was a research assistant at OSTP you get a lot of comments, they synthesize it and then they provide a document that summarizes comments. And I think that like those general like opportunities to get in touch and actually have some degree of influence are not, you know, at the, like it's not really shared or explained to developers. And I would say in general like we try not to like sound the alarms too much, you know, with our, like with our, you know, on GitHub like I say we're sharing something on our Twitter, you know, calling for people to get involved. It's really only the big things. So the copyright directive was a big one. And so that one we did mobilize developers and got them involved. So yeah, I guess it's not as much of an issue as much of a way to advocate is I think where there's a bit of a divide. But yeah, I don't want to add too much noise. You know, I don't want it to be an email list. I do think honestly like, yeah, repositories, you know, maybe something on plural or open states could solve the problem. But like, you know, I think that like you don't want to necessarily add more noise or more alarmism is just like being clear about like what actual developments are like really relevant to developers. Any other questions? Yeah, it's coming up more on like the privacy side. So there was the California like data privacy and prevention act and then there's another one coming out in YouTube. So I would say in general like they're not getting into open source software security for the most part but definitely, you know, there's kind of a mishmash of privacy things. I didn't get in there, but like, you know, just to be clear the US does not have a federal privacy standard and they've been trying. There was one that seemed like it might get passed in the 2022 session and just didn't get there. But you know, maybe this year, but it's kind of interesting too because you sit and like I watch a lot of these hearings and you know, sometimes they'll be trying to pass like some act and they'll say again, we don't have a federal privacy standard, you know. So I think that that one is really a big baseline that is missing and then different states are trying to kind of like create their own policies to deal with that. So yeah, that is an interesting one. Also, I don't know if you guys have heard of the Florida and Texas social media laws. So that's a big one, not so much with us. GitHub sometimes is considered a social media site because we have common, it's sort of a social code collaboration platform, but we are a platform. And so things that come up with 230 and intermediary liability do apply to us. And so that one is interesting because if it does get taken up by the court or depending on how the court decides and you know, Gonzales v. Google, like I think that we could see some changes possibly to 230, you know, in the next year maybe. We'll see. Yeah, any others? Yes, please. You want to use the mic? Thank you, yeah. I guess we all know that GitHub is probably the biggest repository open source code and the go to when people think about where to search or contribute. I just wanted to know what is your, how do you look at open source from your side as a platform and also owned by Microsoft, you know, a big cloud provider with regards to contributions of big companies, specifically platforms and cloud providers back to open source projects. You know, there are some famous cases about elastic source and NWS for example with cannibalization of open source project by, I mean, in this specific project view elastic search, cannibalization of open source project by cloud providers. Yeah, I just want to see what's your opinion. That's a really good question. I don't know if I have, you know, I'm not as well-versed on the side of, you know, cloud providers. You know, I'm familiar with, you know, there's a big push for, you know, developers moving to the cloud. And I know in general when it comes to cloud adoption, you know, developers are kind of, I wouldn't say lagging behind. I'm sure there's really valid reasons why they don't want to move to the cloud, but that is kind of a, you know, kind of an informational thing. And so I think with that, yeah, I don't have a specific of an answer, I know that like the cloud push in general is generally, you know, kind of saying like, look, people, you know, if using like, you know, cloud tools like Azure DevOps, for example, like, you know, that can allow people to like, but what's the phrase like, you know, scale, build from anywhere? Like I think there is that kind of like the capacity argument. And that's really like where the push is for open source is, you know, this can open up your capacity. But again, that's a really good question. But yeah, I just don't have the best answer for it. I apologize, but thank you. Any other questions? Yes. Yeah, no, definitely I would say things that turn really slowly, but I guess the, the Iran examples are really interesting one because we engage with this process for at least two years, it might have been longer going through OFAC to get this licensing expansion. And then it, you know, the government expanded it really quickly. And so it, you know, when there was this kind of this pivotal moment, there was these widespread protests. And so then it was, you know, sudden expansion. So I don't know, I think it definitely seems like, you know, if you can make the case of why it benefits government's interest to turn that key to open something up, then sometimes things can move really quickly. On the developer side, we've been trying to kind of focus on like the kind of economic benefit piece. Because it plays well, you know, like policy makers, they want to, you know, have their constituents be prosperous. And so explaining how developers can contribute to, you know, like the economic vibrancy of a country is a really effective talking point. And there's not too much on it. So like the GitHub contributions is like one piece that we've been trying to get, but we're trying to get a lot more data and you know, just like more research on how developers contribute to US GDP and like more broadly. So yeah, I guess the answer is it's slow, but sometimes it's really fast. And it kind of depends on how much they care. And there's some ways you can emphasize that. Awesome, any others? You know, I think that's a really good thing to bring up. I would say in general, like with the Iran example, so it wasn't that like GitHub was blocked by Iran, GitHub is because of trade access on the side of the US. And so I believe the only country that we're not available in some form is North Korea. So GitHub is pretty widely available. We're still available in China, which is really cool. And so, and you know, there's like limited versions of availability. So for example, in Russia, we have a bit more of a limited availability at this time. So yeah, we've seen quite a few examples. One of those, it's interesting, I encourage you to look up is like the 996 repository. That was an interesting one. And again, shows GitHub being available in China. It was, there was a popular repository. 996 is like kind of a euphemism for like a, you know, really difficult working hours on the side of tech workers. So that was an interesting one. Unfortunately, it was taken down and we don't really know why. So that one, you know, it was definitely an example of people using GitHub to share information, but it's not still around. But there's also been, you know, on the side of like operational security, people attending protests and using GitHub to sharing a specific tools for, you know, increasing your safety. And you know, honestly, I think in some ways, you know, there's such an outsized focus on social media companies in kind of a good way because there's a lot of other ways to share information and you know, also get things out. So yeah, I think in some ways that kind of like being under the radar can be a strength. And also for, you know, again, shows the importance of developers because developers are oftentimes in the front lines of like developing these tools and getting them out for, you know, people who are like standing up. So yeah, any other questions? Yes. Oh yeah, no, that's great. That's a really good question. You know, I will say we have a really great legal team that handles most of those questions and I do think the balance is a really delicate one. That said, I think that in general, you know, outlining like why having access to GitHub is beneficial to US interests has been a very, you know, it has is effective in a lot of ways. And so what are those benefits? Well, you know, open source is a global collaborative thing. And so everyone should be able to contribute. Also, you know, there's maintainers all across the world that are, you know, keeping up really important projects. And so cutting off their access could break things that are important. And then also, you know, like thinking about this, it's like, okay, like what's beneficially safe? You know, it's a country that has, you know, it's anti-democratic or there's more of an authoritarianism. You know, US interests is probably, you know, again, on like the national security side. I think there is a lot of kind of argument for having more kind of, you know, open sources, kind of a democratizing force in a lot of ways. It opens up access for knowledge. People can learn how to be a developer anywhere using open source and contribute from anywhere. So yeah, it's, it can be tricky, but I do think that there is a solid argument for, yeah, why co-collaboration is really beneficial to security interests. And that's a weird thing to say, to be honest. As I'm saying it, I know it's a weird thing to say, but it's also like, you gotta get the wins where you can, you know what I mean? And so I think that, you know, when these interests are aligned, that's a really fabulous thing. So, you know, like, yeah, any other questions? Yes. You know, I have to say, I don't want to answer that question. And you know, so I'll be honest with you, but that said, I think there is oftentimes in the news, there's, you know, you hear about, oh, a policy maker doesn't know how to operate a phone or oh, they don't know how to do this. And it gets a lot of news, you know, it is concerning, but I will say like most of the time when I interfaced on the policy side, I interfaced with staffers. And I do see a lot of really positive developments on like the tech Congress side. I don't know if you guys know Jack Cable, he's a really well-known like bug bounty hacker. And so he was really involved in the drafting of the Securing Open Source Act of 2022. He's now an advisor at abscissa. And I will say, I think there's a lot of technical expertise and, you know, people who are developers and other technologists who are on the hill and are doing a lot of the real work. So it kind of comes down to, you know, if you can kind of turn the ear or, you know, find an argument to, you know, find a way to explain to people who might not be like totally tech savvy, like why it's important. And, you know, like, again, like, I don't think it doesn't Bernie famously have like two apps on his phone. You know what I mean? Like it's, there are a lot of like policy makers are older and they don't use things. I'm not, personally, I'm not super technically versed myself. I'm kind of a Luddite in some ways. But, you know, you can definitely make the argument in other ways, even if people don't have like, they're not, you know, coding themselves. All right, any more questions? Yes, in the back. You gotta be loud though. So far away. Well, I guess I can, I can answer the second one, which is, you know, I kind of, I'm relatively new to, you know, the open source community. And I do think that people can be a little territorial, especially when it comes to policy. And it's not really beneficial, especially when you consider like what the spirit of open source is. So that is something I think that, you know, exists in like a lot of different, you know, like fields and industries. But I do think in general, it's like, it's good to be welcoming, even if people haven't been in this space for 25 years. And also, you know, the kind of the old heads, like they have a lot to offer. And it's really cool to interact with people who are around like when the internet was coming up and they have just been through so much. Is I think for me at least, like I'm not exactly a digital native, but sort of close to that. And so I think I don't necessarily have the perspective of just like how much has changed. And sometimes when you think, oh, this is always how something can be, or it has to be, then you might not have that kind of like flexibility. So yeah, I think, you know, make new friends, keep the old or whatever be, you know, have open source should be more open, especially on like kind of the policy and stewardship side. On like the first one, it's really specific to the developer. I think that's like, I think that skills-based volunteering is a really great way to get into things. And, you know, even like on like the local politics side, like you don't have to be like, oh, I'm going to go into like the national, international, I have to make the biggest difference. Like, you know, like what you can do at like a more community-based level is I think oftentimes more effective. So, you know, if there's an organization like I don't know, does the Boys and Girls Club need a new website, you know, are you a web developer? Or, you know, is it like a data scraping thing? There's a lot of opportunities, I think, for kind of getting involved in that level. And those lead to other things. But I think that's kind of a good level to come into. On the stewardship side, it really depends again on what you're working on. So, yeah, I guess I can't say like anyone specifically, but yeah, depending on what you're working on, there's probably already a working group somewhere that's talking about the same things. I participate, I'm working on these kind of like stewardship principles for package registries. And I participate with an open SSF, like a package registry working group. And they're really cool people. And it's nice too, because you know, MPM is pretty big, but a lot of package registries like are a lot smaller and they're really cool people who are really passionate and do a lot of cool work. And so it's kind of nice to be like, okay, we have resources, how can we like spread them around a little bit more and also advocate for interests for people who might be doing things in more of a volunteer capacity. Yeah. All right, any other questions? Yeah, go for it. I do not do educational seminars in the Southern California area. So yeah, I work at GitHub. I do live in LA, which is why I, other than most of the people I work with live in San Francisco or DC. And yeah, I live in LA, but yeah, I don't do seminars. But yeah, I would say in general, like a lot of, you know, we have events like at the GitHub HQ. So like oftentimes if there's a conference, we have GitHub universe, like there's things like that where you can kind of, you know, go in person and we do events and we've been trying to do more things in person in DC. So that's been a big effort as well, just, you know, getting people excited about open source policy in DC in San Francisco. Yeah, yeah, so much. Awesome, any other questions? Okay, well, it is almost 6 p.m. I really appreciate all of you for staying around and listening in. Again, like, you know, follow us on Twitter, get in touch in the repo. I really appreciate it. And, you know, I don't know what all of you do personally, but I'm sure if you're interested in this, you're doing really cool work. So yeah, thank you for sitting in. And yeah, I appreciate the time.