 front of room 107. Good morning everyone thank you so much for coming my name is Nina I'm one of the organizers of this year's open government track and I heard the last time that scale actually hosted a an open government track goes back in what was it was scale 4x so it was a while ago and it's about time that we had the open government track again so with us here are Vicky, Anglert and Jason Hibbett who are also co-organizers of this track and thank you also to Mark over there for helping us out and helping us get this up and running so this open government track part of the goal is really to connect the open source community with the government technology community bring more open source technology into government and also highlight a lot of the open source work open data projects and all those things that are actually going on in government right now so with that I will hand this off to our first speaker Bronwyn and she can introduce herself thank you very much. Hey everybody good morning you're gonna have to be really loud to fill up this empty room so we're here for open stuff open government. Thank you for being here I am Bronwyn Malden I'm director of research and evaluation at the LA County Arts Commission and I'm going to tell you that every time not every time but very often when I go to open data meetings in county government or tech related meetings and you guys know how data and tech get conflated by the lay persons out there but when I go to those meetings I often get the huh the arts commission is here what are the arts doing here but at the same time when I am talking to the arts community that we serve and arts administrators and artists and arts educators if I show them I can do a pivot table they think I am doing magic so my job is somewhere in between those two communities and trying to bring those two communities together and that's really what the data is on is all about so that's what I'm gonna talk about today. First up I think in terms of what people think when they think of data and art or tech and art so often think of artists who use data in their artworks Paula shares an artist who uses a lot of data in building these beautiful maps that are you know giant wall size that get exhibited in galleries and museums that use things like zip codes and air travel and in this one like median home prices or you might think about artists who use tech in their art making to critique and interrogate things in society using there's a lot of big growth in 3d printing and AI and a are and all of those sorts of things that are that are I think there's we've we're seeing a lot of that growing in the arts field as well so people think about that using tech in art or you might think about people who use the aesthetics of technology and data in making beautiful artworks and things like this lovely painting here but mostly what I do looks a lot more like this and this is what surprises people that in the arts we have data that looks like this this is from a giant data set about arts nonprofits in LA County that's collected by a national organization that we have access to and can analyze to understand the status of arts nonprofits in the county and we also do things that look like this this is a chart from a giant report on an evaluation of a creative graffiti abatement project that was done here in the second district in South LA where we had arts interventions take place at two parks and at two libraries and we evaluated what was the effect of that project on the community and I'm going to tell you one of the really interesting things that we learned from this evaluation is that when government makes an investment in a community the community really feels that when we when it is made to feel more attractive and safe that people feel wow the government really cares about me and one of the lessons we've taken away from that is that perhaps when we're making those investments in government in communities as government we should maybe make a little more noise about it and be a little make it a little more apparent that that's that's government that's doing those things and we also do things that look like this this is a big ginormous data project that we did here in LA County how many folks here are you are you all Southern California LA County okay all right here in LA County 10 million people 4,000 square miles it's the population of Georgia in a space the size of Connecticut and we have 81 different school districts in the county to about 2,200 public schools we administered a survey to all of them about arts education what's happening we wanted to measure the quantity the quality the equity of arts education and we scored it we looked at did some analysis around what demographic factors are associated with more or better arts education in the public schools but the other thing we did was we hired a firm to make a beautiful online interactive so you can now as a parent as an advocate as a school board member as someone who cares about education look up any school any district in the county find out what disciplines are being taught to what grades what what outside teaching artists are coming in to teach you can compare your school to another school school to a district district to a district you can download all our data we're in a you know lovely little spreadsheet and do your own analysis so when we set out to do this our goals for this project were that it should be very useful information it should be easy to use and because we are the Arts Commission it must be aesthetically pleasing and I think we hit all of those so our art said profile is available online as well so a little bit of information about what the Arts Commission is and how we end up doing this kind of work we are what's called a local arts agency can I get a show of hands of who here has ever heard of a local arts agency this is a term that is familiar to you great that means my next slide is useful because I'm going to tell you what a local arts agency is every major city in the country has one most most mid-sized cities have a local arts agency a lot of small cities have local arts agencies these are divisions but usually divisions of local government sometimes a non-profit sometimes maybe a little quasi-government but if you think about what does the Parks and Rec department do what does the Transportation department do we just do the same things they do but for the arts we make the infrastructure of the arts work in general we don't provide arts we help people provide arts to the community and in particular we make grants to arts non-profits we do about four and a half million dollars in grant making every year to everything from the LA Philharmonic to a small community based arts organization that maybe you know a a a small community theater that is all volunteer run and has a budget of a thousand dollars a year we run the country's largest paid summer internship in the arts we provide support for arts education in the public schools we are working more and more through our probation department and mental health department to provide to build arts into restorative justice and rehabilitation for youth we do civic art public art you probably think of public art as a dude on a horse or an LA County it's a mural we do what public art can take a lot more forms we do a lot of that we do fund some free concerts we have a strong focus and initiative on cultural equity and inclusion we provide professional development services to help teaching artists arts administrators arts educators and sometimes artists mostly what we do for artists is help them figure out how to get a contract with a government arts agency and we do research and evaluation and data analysis everything up to the final bullet is pretty common in local arts agencies that final bullet my role in our local arts agency is quite unusual and there are three of us on my team this is a big the city of LA has depart as a department of cultural affairs and they have a person or two who do research but other than that it's really difficult to find people who actually do research and data analysis kinds of stuff in local arts agencies and many people are jealous of us that we have this capacity so in this way we as I say we are like any other government department that most of the time the public never sees us you know the only time like maybe if you pick up a program when you're at a arts event and down in the bottom where there's the logos you might see our logo we are the behind the scenes building and supporting the infrastructure so when when folks are you know getting angry sitting they've gone to the opera and they're sitting there and they're going to the talking about what's the government done for me lately and you're like well the roads that you used to get here and the fact that when you flush the toilet everything worked and the water went out and it's going to be cleaned up and the restaurant you went to has been inspected and the fact that you have art to see you have something to do on a Friday night that's all but government does and we play a role in that how many folks are familiar with this diagram that that's something I think you may have seen I find this really useful when thinking about my role in a government arts agency you know data is that kind of unconstructured bits and bytes of little factoids that there's a lot of in the world and it's only when you start to analyze those facts we sort them and turn them into put them into context start doing analysis that you can start to transform little bits and bytes of data into information and then that information I can share with my colleagues and information you start to put information to use and you test it out in the real world and you find out what works and what doesn't that information starts to become knowledge and over time as you've done lots of testing things out and you use your knowledge and the real world and find out what happens that if you are very lucky one day you will become wise and as this diagram suggests there's a whole lot of data in the world and not enough wisdom so the way I think about things is my job is in that transforming collecting and transforming data and turning it into information so that people who work in my field the arts ecology can use it and they generally function in the universe of transforming information into knowledge and putting information to work and maybe some of them become wise in reality I like to think I move all up and down this and I've got lots of wisdom to share but most of my job on a day to day basis and where the data on comes in is in that transforming data into information and making it actually useful and actionable so in turning data into information one of the things we do is we publish reports you know they are beautiful PDF so we make sure they are aesthetically pleasing so that people will actually at least pick them up and read the first few pages but this is information that is intended to help the field if you are running a an arts non-profit out in LA County we want you to know about what's happening in you what are your competitors doing in terms of salaries benefits and how they're using volunteers we want our arts educators to have information about where who's doing good arts education and where they where where they who they can look to for for better examples of it so these are all same examples of reports we've issued however here's the problem you and even if we publish the reports and even if we publish the data behind it and we get on you know post all our data on sacrata and it's open and everybody can have it and we because these are values for us you know open data and sharing information throwing data at people is not enough and we know this especially for our arts community that we serve as I said people think that I know a lot of people who think that like for whom like getting into a spreadsheet is the most terrifying thing in the world for them and so I want them to use this information to one of the things we do is usually when we issue a report we hold a public event we bring people together we have some subject matter experts to talk about the content about turning what that information into action and how they have used that information and it is those face-to-face interactions that we find absolutely critical posting stuff online making this report available printing it up handing out is just not enough we actually have to engage human beings in conversation if we're going to actually turn information into action and that's how the data on came along the data time our first data well the when I first started at the Arts Commission doing research was in 2013 and everybody I talked to apologize I was the first thing they would do when they they when I met them and they I told them what my job was they apologized to me for you know the arts were so behind we don't understand data we don't use it we really should get better at it there's hardly any arts data and I quickly discovered that this was not true the arts community can be extremely sophisticated in the way they think about data and the way they think about problem solving but they needed help to figure out how to do it better and they had there were some skill sets they were missing and some analytical tools they were missing but there's lots of data in the arts that they haven't mined yet and so helping them to think differently about the data that did exist and how to apply some of the things that we know in the arts about rehearsal and practice and persistence and learning new things that those are things we do really well in the arts that we could do this with data as well so I proposed back in 2014 to my boss that we need to do something that brings our arts community together with kind of that civic hacking those data nerds who want to help make the world a better place community and let's all talk and let's all have a conversation and in 2014 she kind of said that's interesting go work on another report and I came back in 2015 and I came back in 2016 and in 2016 is when she said this you need to do this we need to host it something and bring people together and I had some conversations with my colleague at the city's department of cultural affairs and that's how together we dreamed up the concept of this data time. Now the purpose of this data time was to help our community get better get skills get knowledge and get more confidence about dealing with data but also to get that community of people who are engaging with open data and government and trying to solve problems to understand that there is a role for the arts in solving problems that we actually do a pretty good job of engaging with communities and creative ways and we could help you do that when you're thinking about a community that's underserved are you thinking about arts in that community what kind of arts and culture activities are taking place and how does that interrelate with transit or with sewers we believe there's a role for us and we want to be part of it and we want you to think about our data sets and then we also just want to continue to build out the quality and the quantity of the data sets that we have available but all of this is with an eye to our fundamental mission is to ensure that everyone in LA County every resident of the county has access to all of the benefits that the arts bring and that's not just you know the joy and pleasure of experiencing a beautiful performance that's also the jobs that are available the leadership opportunities on boards of directors of nonprofits that's the opportunity to perform yourself and not just sit in the audience or to paint or to make art and share it with others so we want to make sure everyone has access to the benefit of the arts and the data on was designed to bring people together to let's talk about what's the role of data in improving access to the arts so in our first data time we had about a hundred people we threw them all in a room together we threw a bunch of data sets at them we told them about Socrata and ArcGIS and a little bit about Excel and we gave them the task broke them into groups and gave them the task of coming up with just come up with a proposal an idea for how to improve access to the arts we had some great ideas we had some not so great ideas we gave out prizes we had the head of research at the National Endowment for the arts came in and gave our keynote speech and at the end of the day people were so excited and engaged I mean people stayed all day long it was like so much better than I expected that of course we had to do it again the next year so our status on number two in 2018 we based on feedback we got we narrowed our focus we said we are only going to talk about collections data although we used a very wide definition of collections collections data is basically the metadata that's associated with artworks you know you've got a painting on the wall it's the size of the painting it's the artist it's the year it was made it was the materials it's all of that kind of metadata and there are people who have amazing careers doing metadata and managing metadata around arts collections and so we had thought we had our partnerships expanded this is absolutely critical to making our data on a success is bringing together all of these different communities and we had eight different tracks to choose from there was we did some wiki edit a finding so around bringing more information about public artists in LA County and the wikipedia hollyhock house if anybody knows that the Frank Lloyd Wright house they are digitizing their collection and so there was a session on that there was a session on coming up with a better taxonomy folksonomy tagging system for a collection of historic images at the East LA library in a historically Latino community we had someone explore music and collections data around music and even we had the Department of Military and Veterans Affairs they have an amazing collection of artifacts and memorabilia and they are archiving that so fabulous day we had a great time we had amazing and wonderful people sitting together all day long talking about art data and we had really fabulous swag and we gave away cool stuff because for the arts and we had a team of people who were live archiving the entire event so they ran around in matching little white coveralls talking to people collecting ephemera taking pictures doing audio images audio recordings and we're constantly posting that onto a Google site all day long so that anybody could follow along and see and it was great during the day but it also became a living archive that still exists now that you can go back and if you ever want to hear an interview with someone who participated in the collections data thong it's all there for you to enjoy the other thing we did was created zines for each of the data thongs and I have copies so you are welcome to come grab one when when we're done here again you know trying to just have something so people could take away walk away and think about and I think you may get a sense of these are very intro to what is the spreadsheet what is metadata and helping people get smarter about that and now I get to the most important part which is this year's data thong which is coming up soon and our theme this year is around democratizing data and also democratizing the art and that will be happening on April 3rd downtown public library if you are in the area I would very much welcome you to come we want to bring in more people from outside of our arts community that's one of our very clear intentions this year is to get beyond just the usual suspects it's going to be fabulous there will be a zine but I can't give it out yet because you have to come to the data thong to get it we have more partners this year that we have brought in including our good friends at hack for LA and map time and others and this year we've learned our lesson from trying to manage eight simultaneous tracks this one we're only doing four but these are four amazing tracks that really get to the bread I think in it we've really successfully gotten to the bread what do we mean by art data in four simultaneous tracks we've got the hardcore data science track and that's going to be all about art data we're bringing in a bunch of related data sets to explore relationships between art said data and other things we're going to have a session not on wikipedia but wiki data which in all honesty I shouldn't confess this I didn't even know it existed six weeks ago but thank goodness I work with smart partners who do so we're going to be doing data entry and building out wiki data around public art in LA County then we have our border data session which is our qualitative data session and this is based on a we're working with an artist a local artist named Tanya Agonyga who has a project she's been doing for a couple of years now where she is going down to the US Mexico border and handing out cards and asking people to answer the question what do you feel as you cross the border she's collected nearly 10,000 cards and we she wants to build this out and publish this as an open data set and we want to so one of the tasks is we're going to help with data entry at the datathon but to also use this to have facilitated conversations about a very polarizing and challenging topic to help people talk about the border in a more constructive way bringing the voices of people from the border into that conversation and using her art project to do that and then the final session we'll be doing this is really in some ways are democratizing the art session where folks were going to be literally embroidering geo data onto fabric to maps of LA County printed on fabrics Nina is one of the brilliant minds behind that one so we're going to have these four amazing tracks we're going to look at data from arts data from many different directions there's quantitative data there's qualitative data there's the hardcore data science stuff there's data entry for the public good and there's crafting like whatever you're into related to data there is a session for you at the datathon we have a website that you can come learn about more about it you can link to register for the datathon it's all day long it's free it's open to anyone and and I hope we will and I'm really I'm really hoping this year that I feel like this year we have kind of really hit our stride I keep I keep telling people this is going to be the best one yet and I know that that's kind of like I everyone says that but seriously this is going to be the best one yet and so anybody who's local I hope you can come that's everything I wanted to say about the datathon to begin I have actually I want to say a lot more about the datathon but in an interest of everyone's time I'd love to answer any questions that anyone has so you mentioned wiki data data briefly what is your kind of viewpoint on that as far as having repositories of data do you guys select your own data mostly or do you use their ontologies so we have we have where we intersect with that I think mostly is we have the LA County has a public art collection and we manage that some of that is artworks that we have built and manage the process of ourselves some of that is stuff that's been donated and what we've done it we have a an internal database that we it's on embark the folks who manage embark understand that we publish a subset of that data onto a Socrata the county's open data portal that's in Socrata and it has both all the pros and cons that go with being on Socrata that we it's publicly available you can map it you can do the analysis that you can do but it is not linked to anything else what I'm really excited about with wiki data is bringing some of that data about our county owned art collection into an a link to open data set that will connect with others that you know our we have individuals in our civic art division who know a lot about the individual artists that we work with and that made those artworks but that's not necessarily publicly known and the ability to connect our artworks in our collection some of which were made by some pretty well-known artists and also musicians Hans Zimmer made a painting that's actually in the county collection to connect that through wiki data to potentially other open data sources I think is really exciting an opportunity for us I got a curiosity how many it who anybody hear from government one government anybody else want to claim government it's I think that the idea of local arts what's interesting for for me is that even within government local arts agencies are invisible I still go to meetings of county government where they're like what's the arts what's the arts commission and so I think that one of the things I'm hoping that the data on can do is also raise our visibility within our own government but also raise our visibility as I say with a community like yours where you guys are doing data and tech stuff all the time that is far beyond anything that my agency could imagine doing and one of the things I would really like to learn is what I don't know about what's possible if we brought in new tools you know like I said we are in our world convincing people to get into Excel is a leap for many people and I'm gonna suspect that there are folks in the audience for whom I like wow Excel you're still using that so so having people having folks from the open data and open-tech community I think could really help us advance in our work okay are you looking for additional partners I've got a couple different questions for the event and if not for this year this is obviously going to be going on for the future as well and could you speak about any successes or examples that you've had of people who may have been afraid of spreadsheets who've been able to discover things by attending I'll tell you a funny story I think I've only told it once this morning so far because it's one of my favorite data on stories very first data thought I had I had one of my staff people up at the front who actually the staff person who was doing this has a doctorate her research methods when she got her doctorate with ethnography so she's great with qualitative data and by the time she was a year into working for us she was awesome at spreadsheets as well and I had her up there presenting just some basics of how to use Excel this was like really one-on-one like how to do use the function bar how to calculate average I mean we're really basic stuff but I also said you got to show them how to do a pivot table at least show them I'm telling you people who don't know pivot tables I think it's magic so she gets up there I'm standing in the back and I'm watching and she's going and she goes and at the aha moment of the pivot table the person standing to me next to me I kid you not gasped out loud and turned to me and said what program is she using it's Excel so we have those people in our arts community and we have some highly sophisticated people who are using who are introducing me to things like wiki data and linked open data and we've got all of them in between so I had another person at the same data fun come up to me later and say I'm going to go home and open a spreadsheet because I just didn't know this was possible so we've got all of those folks in our universe we tend more on the spectrum towards the folks who either you know only use a spreadsheet when they have to and it usually involves a budget but we have them all and that's part of the fun of this event is trying to figure out how we can bridge and communicate to each other and that's the for me that's the fun part but let's talk so when I worked in ad tech we had a we would talk about the trade-off between exploration and exploitation the fact that we talked openly about exploitation and ad tech tells you a lot about that but it is one way to think about how to build a feedback loop around you data and it sounds like you are at the exploration phase if you were to hypothesize about exploitation like once you have like what is the goal is the goal to build a community that then has a feedback loop where you're constantly using data to do things what does that look like for you like the data found in 10 years because we're all talking about 10-year goals right what what is what does that go to where would you like that to be I have a couple of things I could imagine happening one is seeing more other organizations making data available about their own work you know we have something simple like there are 88 municipalities in LA County and many of us have some kind of public art culture that we manage and our data sets those of us even who even have all this information in the spreadsheet our data sets don't talk to each other so things where our we could communicate with our colleagues in the field through our data and we could interconnect our data sets that would be a big win in the long term more organizations asking me questions about data I think would be a real win where I said I end up being kind of a catchall on somebody that people out in the community come to and you know where's the and they come to me with very often the question is I need to do a survey how do I do the survey and my question back to them is what's the question you're trying to answer let's figure out if a survey is the right way to do that and if the questions I get become a little smarter that that would be success and people using the data that we share that's that's really I would love to see someone take one of our data sets analyze it and write their own report or bring their own data set to bear against it and and learn things those those things would look like victory one live question where and when is this again April 3rd downtown public library and so much thank you if you'd like feel free to come up and check out the games that were distributed at the arts data pond and come talk to Bronwyn and uh bond who's up here will be leading our next talk at 11 o'clock thank you all right we're gonna get started with our next Harper thank you all for coming out to this is the first time years I think that has hosted an open government track thank you guys for being here today and so the talk today so my name is bond Harper I work for the city of Beverly Hills and so I'm going to talk today about one of my favorite projects that I've worked on since I've worked there so I I came from a stint in the private sector which I enjoyed very much but it's been exciting being back in government because there's so much of it where I'm able to drive the own direction of what we're doing versus in the private sector we were mostly working on projects and the consultant was completely driving the direction of it so it's kind of an exciting place to be in tech although I think sometimes it gets a bad perception so just to divine some of the things here so a lot of this talk will assume no knowledge of these tools but I hope even if you're very experienced there's stuff in here you'll find useful so what I mean by open source is just that the source code behind it is open to anyone to reuse for any purpose and then open data is just data that's machine readable that can be used reused redistributed by anyone for any purpose so it's a lot like open data and open source are kind of in many ways to one in the same kind of thing just a different thing and if you'd like to follow along with these slides it's the font's too small and you want to look at them on your own computer or your phone you can follow along at this site so it's just bondah.github.io slash scale17x and whilst you if anyone is interested in bringing those up whilst you're doing that during this I do not mind at all if you want to interrupt for clarification or a question if you can just raise your hand and Nina will bring you a microphone to ask your your question if things are getting way off track we may go back and put questions at the end but it's a small enough group I think I'm perfectly comfortable with answering questions during if you have any so here's the site that we created so this is kind of where we're beginning with the end so this is the site I created using the methods I'm going to show you today and so we can take a live look at it here so it's a fairly simple open data portal it is our beta site and I'm going to talk a little bit about why we did this this way there's only 27 data sites out here but it's searchable you can go click on stuff you can download so it's got some of the essentially the basic functions of an open data portal so you've got your metadata about the data's ability to search for data and the ability to download data and it's arranged in categories so fairly simple but it's meeting an immediate need and it's free which is even better so today's topics are I'm going to talk about getting started with an open data program or initiative within local government building the actual open data portal itself the steps we took to do that and then handling data so first off getting started on this journey so the typical process when a city decides they want to have an open data initiative is it starts with a directive perhaps from your city manager or from city council or an advisory group to the city says hey you need to do open data and from that then the city goes off and either themselves creates a strategy or perhaps hires a consultant to create a strategy and then from that strategy there's a policy derived which is a very formal document usually it's ratified by city council and posted on the website and then after that is implementation and so there's a lot of great resources that kind of walk you through the typical process and then give you great pointers and information so I'm just briefly going to highlight some of these if you want to look into them later if it's kind of thing interests you so the sunlight foundation is one they have this open data policy step by step which is kind of just as the name implies it steps you through the process of getting to an open data policy from the very beginning including basic definitions of things so it's nice if you've got perhaps a group of people who aren't familiar with open data policies that haven't you know seen them from other cities or things this is a nice step by step walk through this one is perhaps one of my favorites the european data portal has this open data gold book for data managers and data handlers it's a downloadable PDF what I love about it is it has lots of infographics and diagrams so you can actually hand this to someone in your c-suite who doesn't read a lot of text and even if they just scan through it and look at the pictures they get a lot of information about what your open data policy is and kind of what you're you're driving at there's a lot of things about the benefit again it has definitions of things so for people who aren't familiar with it I think this is a great high-level one to hand out and then there's also some solid information in there about kind of how you walk through this process this one is the exact opposite it is literally a wall of text this is from the federal government us project open data it's still very beneficial but this will probably be more for your back office folks so if you're the one creating this it would be useful to you and it's nice because it gives insight even from the local government perspective into what the federal government's doing and how the process they use and then kind of an in between one open knowledge international has this open data handbook one thing i'll highlight here that they have which is really beneficial if you're trying to get this started is value stories if you're getting a pushback on doing an open data initiative at your city it helps to show where it's been beneficial in other areas so I think value stories on open data are beneficial to share with folks and then lastly the world bank also has one that's a open government data toolkit that again walks you through this it's intended for all levels of government all kinds of government but beneficial also for local government so between all these resources i think looking at them and figuring out what process you like to follow so we essentially didn't do any of the typical recommendations of starting with that directive and a policy and formalizing the document we kind of started with this beta site at the very beginning so the city of Beverly Hills doesn't have a formal open data policy or strategy document we're going to go back and and do that it's not like we're just you know going to wing it forever but there's some clear benefits i think to starting with a beta it's fast you can get something that of it takes time to do a policy and strategy if any of you have worked in or with local governments you know there's lots of people lots of stakeholders i think there's benefit in having all those stakeholders have feedback to it but it means things take a very long time to get done it's a proof of concept so particularly again if you're talking to people who aren't familiar with this it's like here's our site you can click on things and you can see what it is and what it means to you as opposed to trying to you know show them other people's other cities examples and things like that and it means immediate need to have some data sets that people are requesting right now so this will be the bulk of it is kind of walking through building the open data portal itself so what we use is jcan which is a jekyll based static site generator for open data portal and i'll talk a little bit about what jekyll is and kind of what this is so jcan is in the family of some other cans in europe and also by some agencies within the u.s to do their open data portals so i encourage you to check it out the reason we didn't use this one is it's a lot more complex to set up so setting it up on your server and getting started with it takes a lot more time it's a lot more robust than jcan that it has a lot more opportunity to do things with apis or other things that that we currently don't have the opportunity to do and then also in that family of cans is ecan where the d stands for drupal and so similarly it's for open data publishing platform and this is used by a lot of folks that perhaps are using drupal anyway within their environment so they've already got people used to it but i don't know really hardly anything about drupal we're not using it widely so it didn't really make sense but again if it's something maybe you're already using we're looking into it is more robust but again it's harder to set up particularly if it's not an environment you're already familiar with so jeckol what's jeckol so jeckol is a really based framework for making static sites typically blogs a lot of people use it for their own personal blog a lot of people use it in conjunction with github it has a lot of resources that will help you use jeckol to make a blog and then publish it entirely on github so it's widely used kind of for the blogging world but in this case we're going to use it in each of our data sets it's kind of like a blog post is how jcan works and then what's ruby so ruby is a dynamic open source programming language but the benefit of this is you don't have to know ruby or have used jeckol in order to use jcan so i have limited familiarity with ruby and i've used jeckol for a personal blog that is very very simple so i'm certainly not an expert in this and i know other folks that have used jcan you don't have to know either of them but i think it's great as you do this you do learn about them so taking that jcan template um and then making a lot of variations googling how to make modifications and things like that um allows you to kind of learn how to use things um so kind of hacking something that already exists perhaps is the best way to to learn things so if this sounds like it will work for your organization or your city how on earth do you get started with this so you're going to need three things a development environment or a web server and then web accessible space in which to host your data so the first two particular the second two particularly if you perhaps don't have a web server or web accessible storage pretty much all city governments have a website obviously and there's usually already data stored on it but i can see very often folks aren't allowed access to it so perhaps they're wanting to get this initial side up to show people as a proof of concept but they're told you know oh you can't put that on our website or we don't have a space for that um or maybe you're here and you're with a small organization as opposed to a city government and you and you don't even have that web server or web storage space so you can also host this on github for free and in fact jcan that main site the steps kind of walk you through using it where it's assuming you're going to host it on on github and so that hosts the whole site on github kind of like how this presentation is hosted on github so you don't have to pay for those kind of things you'll be limited if you're just planning to store your data on github github does have limitations on the the file files that you can put there and then speeds and the number of people hitting your site things like that so it's intended for maybe a small organization to use if you're talking about using this for a big city ultimately once you're done with a proof of concept phase you're going to want to move it over onto your primary web server and web storage space and then that's you the need for development environment i think perhaps it can sound complicated it doesn't need to be at all this can be your laptop computer this can be your computer at work it can be essentially any computer where you're going to install Ruby and jekyll and then the files for it can be stored anywhere so same thing stored on your computer i store it on our one of our network drives at work so nothing complicated so the steps to get your open data portal running are as follows so jekyll has been installed as a ruby gem and so what a ruby gem is it's it's just a programmer library that's been written in the ruby language so it's just a cute little way of calling that so it's kind of a self-encompassed program or a library that you can use and install and it's also fairly simple to install so once you've got ruby installed so you have to have it installed first then from your command prompt you're just going to type this or copy and paste from here from the jekyll installation instructions and this installs jekyll and bundo which will allow you to do essentially the rest of the steps involved in this so you need to get the code for the site itself so you'll grab the jcan code from github and then say that's your development folder unless you want to start with there's some other examples that i'll talk about unless you want to grab one of those because it's closer to what you're actually intending to do but this one will have all of your basics in there the format's pretty basic um i can show you what it looks like when you grab it so you're definitely probably gonna want to so this is kind of the when you download it what you get so you've got some tiles very basic stuff which should look very familiar to when i showed you our site just different colors and different ones so that's kind of what the default jcan template that you get from the github's gonna look like based on what you're going to be using and set up your computer i highly recommend you set up whatever version control you like to use on that folder just so that as you start making modifications if it breaks you can go back to the last just known working version that helps for this because it also allows us to share code with the whole world so the code for Beverly Hills Open Data site is open source out there on github feel free to fork it and use it as you wish and then you'll also need a code editor or a text editor to let you work on the side of the project so there's tons of ones out there i use adam there's many many others though that you can use the main thing you'll want is just one that lets you kind of see your whole project all together because the items for jcan there'll be a whole lot of different folders and then items within those folders and it's nice to be able to see everything together as you make changes to things and then again like i said those examples so on that jcan site it's got these links to these various ones so if we go back to the main jcan site so these are some other agencies or groups um San Diego which i'll talk about a little bit is really beneficial they've really um probably have the most robust of these uh different options but there's also different styles so you can kind of see how people have styled them so this is a St. Louis group it's very different but you know those are categories and when you click on it it's kind of very similar although that category has nothing in it that's not very interesting there we go um so yeah so you can kind of take a look at these and if these are closer you can use one of those as your starting point and again all right so here's the thing so students San Diego deserves a huge shout out um they also have their um current open data portal it was done in jcan it's all open source all the codes out there um and it is very robust so if we want to let's see if i can open their site real quickly so we can take a peek so here's our current site we've got some trends analysis things and their data sets we can search for something like i don't know if they have parcels and they do have parcels and so yeah so they've done some more things than we've done so they've included kind of a data preview on those um but other than that it's basically the same you have the ability to download um you can change which data set you're previewing um on here and they have a lot of data sets on theirs um so yeah um and they also have a blog where they have some kind of useful um information about how they've gone through this process um some of the things that they do um they also share um so this particular post talks about how when they switch from the prior open data portal to the current one and again i just found this very useful and helpful and i'm greatly appreciative to them for being open and sharing all this stuff and i think it highlights the benefit of um local government um being involved kind of in in the open source world and in sharing our code with each other because it ultimately saves taxpayer dollars even though perhaps it's taxpayer dollars and a different jurisdiction um but it's it's hugely helpful when you can see what other people have done and i think in the past um local government has not been terribly great about sharing our code in our projects everything has been somewhat insurer um not just between cities but even within cities i think that's hopefully changing there's some really cool initiatives there's one in the state of california um looking to get together people who are in various um realms of government that are interested in uh open source and and making our code open source so like a lot of sites these days um people either love it or hate it um jcan uses bootstrap um in order to create a responsive web page and so responsive in this case just means as you resize your browser window um or you open it on a phone or a tablet it's everything's in a nice layout and you don't have to design things separately for different um kinds of devices but yet again you don't need to know bootstrap to use jcan so you can dive right in and never know anything about it and be perfectly happy um if you do want to customize your css uh which is how it does the styling um you can use bootstrap has a little customized thing and you can do all the customization from within their web page so it lets you change the colors the fonts the sizes um and so that's kind of how we took what i showed you that initial um one that had lots of blue and stuff and put the lovely browny gold color that uh is our city so gold and black are our city colors um and again then the fonts and stuff that your city uses so you can use this to do the styling or if you know css hop right in there and do all your styling there to take their stuff and make it your own so within the stuff that you download um for jcan there's going to be this config yml file and this is where you're going to go and set some basic information about your site so there's a lot of different files that you're going to use to configure it and then what uh jcan does is it just uses jekyll to take these template files and then build that static site and so in here you're going to put things like what's the title of your site um so i was his open data um agreeing so what's the beginning there the description what your url is if you've got your logos stuff like that um so just showing you this to show filling it out is much like filling out a form a lot of these things so again if you're not familiar with um any of these particular tools a lot of the stuff you'll be doing is is very tech space all right so now we've got a site maybe you've done a little styling it's time to add your own data sets so data sets are added as markdown files so markdown is a markup language it's a lot like um writing text but with some other little things most of what we'll be doing with this is literally writing out text so your data set markdown format is going to be based on a default yml file um so this is a file where you're going to go and you're just going to fill out what you want to show up on on your data set page so the data set page is if we pick biker out switch shows up here so i've got this resources category i've got kind of this open text thing here where i can describe as much or a little about my data and then essentially your metadata so each of these license disclaimer category maintainer things like that you'll be fine in this default yml file and so then for each individual data set this is what your markdown will look like and you fill it out just kind of like a form so those different fields that you have are kind of what's showing up in reddish pink here and you say you know hey this is the scheme to use for this this is the title of my data set you can have multiple organizations or one organization um again that's kind of in the template files and then for each data set you can have multiple resources so for in this example we have a shape file that folks can download but then there's also not that location that they can go and view it or you could have you know as many as you like essentially there and so you just add it here as plain text and then this url here is that data set where you put it in that web accessible location so again like i said if you don't have one you can use um a space on github where you're hosting this or if you do have one you'll just put the link to wherever you're putting that so yeah so you fill it out kind of like a form and again yeah the urls and resources are just that web accessible location where you're storing your data because as you share the site people need when they click on it uh to go somewhere where they can actually get the data and so if you want to add the format of your site so kind of where things are laid out as opposed to the style so instead of changing the color from blue to brown i want to change how my tiles are laid out um add sections that kind of thing um you do that in the html file so these are stored in the layouts and includes so layouts are just um template says there's a layout for your data sets there's a layout for your main page and the other page is your default layout and the beauty of this is like a lot of different frameworks that build things are you change stuff in just one place so you can go in layouts and just change one html file and then all of the different data sets and and things that reference that will use that so it makes it much easier and happy to change a bunch of individual pages and is kind of the power of frameworks in general from the very simple such as this one to the very complex and then includes it's the same thing so this would be things like a header and a footer perhaps that you have so a little bit of um html layout that you want to include prepped on lots of different pages um so we do have on ours a header and a footer that's used on all the pages and then categories so most open data sites use categories i've heard some interesting things about the public sometimes is confused about categories and there might be in the future a kickback to move away from that i'll be interested to see because literally every open data site i've seen has these little category tiles um it's kind of like somebody thought it was a good idea and started it and everyone's used it perpetually ever since um so this has the categories on the side and then the category tiles and part of that might be driven so it's talking about kind of talks about the various open source open data portals so the biggest one currently at least in our region right now is stakrata not an open source one so it's a paid one but most local governments are using that locally um so that's kind of a big competition to doing anything different and a lot of the visualization and the way things look i think is driven by how stakrata is decided to show things because they're kind of like the beast in the room and so since they have the presence and so many people have become familiar with how they look the open source ones often make that because then you don't have to retrain people on how things look or how do i download the data they're familiar with it um from having used things like stakrata um but the beauty of open source stuff is if we decide that's a bad idea categories don't make sense to people is there a better way we can redo things um but currently it's all kind of the same but it's an interesting discussion i think of how this is going to look in the future and how it could be more logical for people um to use um so that's the categories so the categories used to organize the data are modified in this categories y amount again it's much like filling out a form so you have your names of your category tiles um that little image that shows up in the middle of it is uh an image file that you can specify here and then featured uh true or false um true means it's on there so it's featured it's showing up versus false for example for now um so and there we have our school district is separate from the city so we're not the not the same agency at all um in the future it would be wonderful if we could get with them and host some of their data um they don't have their own open data portal um but currently we don't have that it haven't even started talking about that so featured is false but i've left it there because i'm hopeful in the future that we can make that possible and so yeah again so these categories that you set show up both on the main page and then in the data set page and i use kind of group things together so people interested in a particular topic can narrow the search again on our site we only have 27 data sets right now it's not that hard to scroll through the whole list but someone like San Diego that has a hundred or more data sets it's useful as people are looking for particular things to be able to do some filtering so my recommendation in getting this started in local government um there's often a lot of resistance to any kind of change or doing anything that people perceive as might adding to their workload um is to start with data sets that are already published but perhaps not in machine-readable format so most local governments publish lots of data already um a lot of it's pdf um documents so look on your site um so i think so publishes a pdf that folks might really rather have a table that they can review and analyze um so here we look for salary the city publishers um as pdf but that's not very useful if you want to sum up what are you know the total amount of overtime hours in the city how much are people getting paid in vehicle analysis allowances things like that so we would probably really rather have it as a csp so we took something that was already out there so people can't object to us publishing it because it already exists people can already see it they just have to run it through their own character recognition to get the information of that pdf so now we're just taking out that set they can grab the csp and get going right away similarly um looking for things trapped in a pdf you might want to look for maps so a lot of cities um publish maps um from using fgirs and using other various tools and web map formats so that people can zoom in and do some basic things on turn layers on and off but what if you know the particular layer you're interested in isn't on that map or you want to layer it with some other stuff or do something completely different with it that you can't do with their web tool um it would be nice if they could have that data themselves so again we took something the bike racks throughout stuff like that was already out there in the open so it's nothing new nobody can object to us publishing this and now we give it to people as a shape file that they can download so then they can download that shape file which is a mapping file format and use it in whatever they like there's um a program called QJOS which is an open source mapping product that they can use makes their own maps do whatever they like with all right so once you've got some data sets in there it's time to build your site so as you're going through and working on your site um again you'll have that local um development environment you can run a simple local web server in which to view the site so since there's a whole lot of files that make it out if you open any one html file in your browser it's probably gonna look pretty funky because it's supposed to be referencing all these other things so in order to do that um you can use um there's no ways to do this but one way is using python to run just the simple local web server so you'll just want to go into your command prompt change the directory to whatever that project directory is wherever you store those shakehand files and then you can use this line here and as long as you have python a lot of computers or other programs require python this will work with pretty much a new version uh if you don't have it you can grab it for free so then in order to so once you got that web server running um whether you're using python or another uh web server then to serve your ducal site locally just to locally you know test look at it for testing you'll use this line so you can just copy and paste it in these little lines of tests are also on the jcan site as well and then once you've run that line in your command prompt then you'll just go to localhost and whatever port you use here so in this case i use 4 000 if you leave that off i think it defaults to 8 000 a lot of times uh programs already using that one so it's probably best to use a port that is unlikely to be in use so just do localhost and then if you specify the port specify the port and you should see your site and now you can change step in your code editor and then you can um save it and then see the changes over in your locally hosted one and so you can do that as your testing to make things look how you want them to so once it looks how you like it for it and you're ready to push this out to a wider world then in order to build your static site um the and so the site site is what you're going to copy to the web server to put out there for everyone um in your command prompt you're just going to use this line here to um publish that site and so what that does is it takes all those different layouts and templates and datasets and wraps it up into one website and so it makes the different html files for you and the css based on those templates um and all you have to do from that point is then copy and put it somewhere and so that's the beauty of the framework is it takes your template applies it to a lot of different data sets or pages and makes it ready to go quickly all right and so everything for your site will be encompassed in this underscore site folder so all you have to do is copy paste the contents of that folder to your web uh so whatever that may be or uh push it to get up and to make things easier i would recommend um a python script that kind of does those um essentially two steps it's not that hard to do in command prompt that if you're making updates fairly regularly maybe you want to schedule this to run nightly instead of having to do it manually make a little python script that will do it for you so essentially all this does is says hey where's my dev folder where's my website um posted and then it goes into um in this case um the windows command prompt and runs that line uh and then copies it over into and writes over whatever used to be on the website and that's it so if you've done that you will have a open data site um easy as that and so you can enjoy that open data site and after you're done enjoying it um it's important i think to document what you've done and then to teach someone else how to do this so because this is simple um you can do it with a tiny team um so within the city i developed the site myself and then i kind of showed other people hey this is the steps you do in order to create the site um similarly i talked to sandia go they're also doing this with you know very small uh team as well so it's nice that this can be done you don't need a big development team you don't need a um and honestly really don't need anyone who's a developer because it all it is all mostly text file editing so handling data so now that you've got the site out there i would highly recommend uh setting up automation in order to pull that data set so you don't want people manually uploading data and in fact that some of the complaints i hear with the um the paid services so the software as a service um type open data sites is it's very manual you have to have someone upload data changes or make changes manually and nobody likes to do that it's just an extra burden and it can also be um something that keeps people from wanting to do open data within your city is that so it's adding to their workload so the idea is it's possible to leave the data where it already is so maybe it's already on a network location within your city or already sitting within a secret database maybe you can just pull that data from those locations and do that if it's not already somewhere put it somewhere because it really should be in some sort of backup location anyway if people have it hidden on a diskette on their desktop that's bad so yeah so the establish that location and then use a script to make a copy of that on some regular basis so for ours for the permits we do that daily so it pulls those around a secret database so you can use SIS or whatever you like to use to get that and so that just pulls a CSV extract out of the secret database um similarly for our GIS data we use a Python script to pull it from where it's already living inputted so some of the data steps are only updated occasionally bike routes unfortunately we're not adding a new bike that very regularly at all so that's just kind of on an ad hoc but things like permits people are adding permits every single day so our permits are always changing if somebody had to manually do that it it would be burdensome so it's nice having it automated and then that way it's very useful even if it is a small amount of data because it's always changing so I recommend you also keep a data inventory and assign every data set to a single person who will be responsible for it um these are completely made up names I used a random name generator which was fun clicking through to get a random name um but yeah so you won't I think it's important to have a single person uh because that way there's someone you can talk to even if the permits would be a great example there's lots of people involved in permits and lots of different kinds of permits um and so it's probably this person in this case uh fictional marcia would be going back and talking to a lot of people to get information about that but you as the person or you know your team the handful of people running this open data initiative I think needs someone to talk to about each data set so have that particular owner even if you don't publish that publicly so on our case currently all we have is the department published so if someone had a question about a data set they know what department to go to there was a little bit of pushback about putting specific people's names out and then once you have your data site you can use it to initiate conversations which is kind of where the city of at right now in this so we're using this data site to as a talking point essentially to get people interested in talking about what do they need on it and this is people both externally so the people who are using the site and then internally within the city so having that starting point with our own real data um allows us to do a needs analysis I think a little easier than starting from nothing and just looking at other cities or or trying to conceptualize what it might be so we start looking at what features do people need do people actually need an API or is having a CSV in a location that's overwritten every day fine and they can just point directly to that CSV and scrape it daily outreach so again having something to show people allows you to do outreach easier than saying hey we're interested in open data talk to us about it you can point them to a site that they can play with and then I also think it's easy to talk about pay what data we could add if you've already added the easy wins so you're already showing hey we're behind this there's some things that we are able to share easily with you and then you can have conversations about what else is easy to add what's a little harder to add maybe that's on the wish list for the future to add instead of trying to jump to the point of we have to get everything that is public out all at once so that's all for kind of the meat of the talk and then if you would like to contact me i'm on twitter fairly regularly i'm on linkedin never but i do that goes to my email so that's that's fine as well and then there's my email address is on my work email you're welcome to get in touch so does anybody have anything they want to ask or chat about we have 20 minutes i think thank you for sharing your insight so you mentioned about you know handling of the content but i was wondering if you have any anyone to offer in terms of security obviously you know things like this you don't want to you don't want it to be hacked and you know have it in the wrong hands anything to offer here so the s i s i what's pulling from the secret server those scripts run on our own servers behind our firewall in our own environment and then all they do is push that csv out into this public location so kind of all the data scraping and that's the difference between this and kind of what also makes jcan easier than maybe a dynamic site is there is a completely static site of we've pulled out the data and so we are using scripts and stuff to make that easier but there's no connection back to our main stable databases from this site which i know some more robust interactive sites do that kind of fancy like on-demand stuff or somebody could write a query essentially from an open data portal and then query it we don't do that all right thanks for the talk i'm brian from the uh city of austin and so awesome texas yeah i've worked for the city of austin texas for 10 years really nice section department you're in watershed oh no way i was doing the uh new yeah no anyway so we're using cicrata for the open data portal um right now they're like 2,600 entries in the database like a thousand of them are marked as um as data sets and then like 500 as charts i just had a quick search and it was like 95 that are labeled 2018 so there are it's definitely not the cleanest open data portal and i just wanted to be an example of a large city or organization that was dealing with these large number of data sets that would be an example to kind of bring back and show that maybe cicrata is not the best way to go about this i would so cdgo is probably the best one i can think of because they did use a a prior product um not too proud i don't think but but they they used another product so they had an existing open data portal and then decided hey we're going to do our own um versus the city of everland hill before this we had nothing so we were starting from zero and now as you see we have 27 so very bite-sized um but yeah so they had i don't think that many data sets but they have a large number of data that's in an existing portal but then they moved over into their own and in talking to them i think a lot of the reasons behind it were that they didn't have as much opportunity to do this automation when it was within this software the service that they were paying for and that's where they were seeing the limitations of people were putting lots of stuff out there but not maintaining it and so they were like it's better for us even if it involves us setting up to set up that automation and and do that and i'm not sure how they ported over all that old stuff it might be worth seeing they're very nice and very friendly but very busy um okay so contact stuff all on their open data site do you want to see if they would be willing to talk about how they took the stuff from their software the service and ported it over into their own environment awesome thank you yeah this is kind of a general question but um i was wondering how you use your engagement was and if you were planning to improve it somehow right so we've been outreaching two folks who are already kind of interested or had approached the city in the past about this kind of thing um and talked to them in the next phase which we're just starting up it's where we do wider outreach both internally across city departments at the moment this is all being done by me within it but i want to reach out to other departments and start getting their feedback on what do you need in an open data site what data sets are you being requested a lot for and then likewise back out to members of the public so that could be developers that could be just specifically engaged people um and figure out kind of what they need and what they want so right now our outreach has been somewhat limited to folks who are already kind of civically engaged i think the next step is getting to the folks who might want to use it but haven't used open data um those folks i think will be harder to get to but that we want to include in figuring out kind of do we keep with this for the next phase do we move it to a more robust platform something like secan do we end up openly with a software as a service so yeah i think those are coming so if you know anyone who's interested in open data which is very specific uh when to talk to them yeah i don't have um i don't have a whole lot of experience with this but i i went to like a a hack for la meeting i went to a hack for la meeting and they're developing what they're like for our programs that integrate with like cities and uh and i noticed that in your case you're dumping the data into text files of some sorts like markup files so you have to write scripts to from the database to dump the the into into markup files and in folders so getting the data out of your open data source uh i noticed there's an application that somebody's developing for the city of san amonica and they're reduced to like screen scraping they can't connect directly to the database and usually i'm used to you know it's like there's an api associated with the database and you query the database do api directly and you get the data back that you want for your like phone app or something like that it's like a city council meeting response application or something they're developing but they were reduced to like a screen scraping the data because they can't connect to the database directly so i'm just wondering uh that's i guess if you're in a consistent format but then you know how do you screen scrape across a bunch like 600 files to find what you want need you know it's kind of so there's a limitation there to like just dumping to text files you know uh that basically you're limited to to people have to do these complicated screen scrape maneuvers to get the data out of your yeah the data itself actually isn't in the text so the text file is just the kind of the metadata behind it so the page that shows people what's there and the data itself most of our tabular data we use csv for all of it so the csv if it's something like salaries the salaries will be unique to the year because like that was salaries 2017 soon will be salaries 2018 but things like permits there's like a permits dot csv so it's always going to be the same link to permits dot csv it's just every day new permits are being added to that permits dot csv so that link um you could take and then take that csv and port it over into your app but it's not an api and i think that's where we get back to talking to the users but the lady over here asked about of do people need an api or does this meet their needs and that's a big i think the suggestion point so far we've had internally but we need to talk to actual stakeholders to see how big is the demand for an api what i've heard from other agencies is a lot of developers don't always rely on city apis because cities are notoriously not reliable we could switch the product after you develop your app and then your app breaks versus if you scrape the data and pull it over into your own place then you know we change our thing or our website goes down or our servers go down actually the city's pretty darn reliable but still there's a perception i think that city governments aren't reliable um that a lot of developers just pull the data from the api and put it in their own place and so similarly um we talked to some other cities and they've said yeah putting it in a static you know the the links the same to that permit csv and again like you said i think it's important the format stays the same so if you're relying on a particular field but that's not changing from time to time yeah that clarifies it's rather ironic that you're talking about Santa Monica is that the proposal due tomorrow before uh then um uh i'm uh have a question about data sets that uh is accumulated by the city just in the business of being the city you know and i'm just proposing i'm giving the Santa Monica will be gathering more data sets um that are not part of city business so to speak to make a smart city so um is there any kind of consistency across the various different uh cities as far as the data that they collect i would win just the normal you know business permits and you know dog catching you know violations and and then also uh regarding how many of the unified school districts data is i now can a connection with these with the school district but using the community college any other types of educational things that connected to maybe the library maybe you know museums things like that there's not consistency on how they was requested i think that's the area of improvement and i've seen things that i think are moving in the direction of improvement in regards to things um being consistent particularly i think consistent schema so obviously different cities are going to have different needs you know some cities have a subway going through them some don't so i think the data sets and interests are going to change but having consistent schemas especially for developers of all cities have permits but we're all sharing our permits data slightly different so if you go to any open data site download permits and it doesn't matter whether like the framework we're using doesn't matter so socrata a belger or some other one um the problem being there's not like a consistent field of permanent id and permanent id field is always called the same thing and that i think is like a dream world to get to where we're all agreeing to something like you know the gft gtfs for transit data a lot of agents transit agencies have said hey yeah we can all agree to use the same schema and that allows amazing things to happen because of it it would be great if cities could get together and say yes this is a schema for our common data sets to agree to that's not there i think that would be a cool thing and things like the kind of california open initiatives i'm hoping things come out of that as you get tech folks from various agencies talking to each other yeah i didn't know of any unfortunately um maybe it could just be a little about it maybe it exists yeah i saw in one of your slides you uh referenced the stonia i think yeah and i was wondering if you could comment about it uh you've mentioned some us-based uh examples and wondering what your thoughts were with the stonia or any other foreign based sites yeah that's an interesting question because i think overseas has been much better traditionally than the u.s and using open source for their open data portals so stonia is on there because they use jcan so there's only a handful of folks that have used this particular jcan it's it's somewhat niche because it is very simple but again to as that starting point i think that's the beauty of it um so yeah so there if you want to take a look at their site let's see if i can get to it quickly so yeah so they use the jcan framework if they come up here if you if you know estonian or what they speak in estonia we click on this i think this is their data so it's yeah so that looks familiar just in a foreign language um but in general a lot of the european countries are using ccan or some of those other ones and there's been a lot better question i don't know why the u.s got so enamored with cicrata um at least particularly this region of the u.s especially um where all of our surrounding cities are using cicrata and a lot of them you know i i hear complaints i guess everybody complains about something if they're using it long enough if we use this long enough we'll complain about its limitations and some problems but it is intriguing to me why wholesale so many are spending lots and lots of money it can become very expensive um versus in europe they went with the let's try the open source and build our own and have been historically collaborating a lot more amongst governments to to do these so it's kind of interesting and um interesting dichotomy i think uh i just had a quick question eight of sets there's a disclaimer and uh you often know warranty right there's no uh guarantee regarding specific accuracy so forth i understand why the city might be interested in putting that disclaimer there especially when they're testing the waters but do you think that eventually we can hopefully get rid of that since the government should be a source of accurate information and if they omit certain permits that certain people are issued that could be potentially covering corruption or something like that illegal or a law person i know still a little about it and i think just let our legal departments tell us what to put on them um yeah i think where you're coming from from the you know it should be a source but then there's also like the legal department's coming from with the let's cover our rear in case you know the scripts that we run nightly you know glitches and i think doesn't get out there but but i guess ultimately i feel like this should be like walking into a courthouse and demanding documents as a lawyer you they can't hide documents from you or they can't have the data be incomplete or inaccurate so ultimately it would be nice if we didn't need those disclaimers on this there should be a transparency of the same sort i would imagine so yeah i agree i mean maybe it's the finding a way to warrant it better to say you know hey we you know pull this using an automated script you know something could possibly go wrong with it but you know and i think it would be amazing to share you know here's the sequel query we run you know meaning you know our database location and password or anything sensitive so that people can transparently see what's happening um but then you know have whatever legal leads to cover our rear but i agree there there shouldn't be manipulation of the data and honestly i think that's the the automation truly is this a sequel script and it's going to pull whatever those fields you download our permits you'll see um and there's one thing in our permits that bothers me is we have a weird little header i'm going to get rid of because i want the header row to be our actual field so we'll get there um there's like usually it shows up here at the top there's one called test and somebody put a test data in there and as much as i would love to remove that row of test data it's in our database and so everybody gets to see our our mess ups and things like that but i think that's the benefit that's the hesitation of a lot of governments and folks in government to transparency is oh my god people gonna see our mistakes but to me it's like well then they can help us fix our mistakes and or just know that we're human or be hardened to know that we do test our stuff before we release it um so yeah i think it's an interesting line to walk but i think it's hugely important that it's not manipulated data but it is extracted data and it's okay i think to omit things like on our permit to omit um you know people sensitive to account information their main and the whole mad stuff like that um but the fields for pulling aren't haven't been manipulated yeah i think one minute tell me anything that's the one thing about sitting in the front row we always ask a lot of questions um carry on from that last conversation regarding errors and things of that nature and the data you know you you bought it out of your database it displayed it for the you know any developers um what about uh you know like these for instance uh you get a business permit and they're in business and then most people get in business and they go out of business they don't notify the city they're out of business so unlike yeah sometimes customers will say something closed and it's not closed i don't know why they do that but they hey you know you go check that spot you know it's no longer in business or something of that nature you know the information that might be in your data set has some kind of a feedback loop so if you can visually check it to see yes that is when you guys do on your end is there anything like that or the data is just going to stay there for a year i think we went back as far as is 2015 and so the idea is not to remove anything it'll always go back as far as 2015 essentially that csv just keeps getting bigger and bigger and bigger as we add more and more and more um that's time to go on but yeah there's nothing to go back and say you know hey this permit was revoked it's just here's all our issue permits and that comes down to i think an interesting discussion of you know not manipulating but then how do you at the same point get people the information that they need that's relevant and that comes back to them talking to the users about what they mean yeah all right thank you so much bond uh let's give a round of hands and thank you all so much for coming we will zoom the rest of our tracks starting at two o'clock this afternoon um if you're interested in talking more about open source and government open data and all this stuff feel free to come up to us and talk to us we'd love to chat thanks water and welcome back to the open government track um go ahead and find a seat we're going to get started again uh first off this afternoon we have jason hibbits uh talking about open data and civic tech means i'm mostly uh doing community management and project manager at work uh amongst other tasks that i take on i work on our open source dot com project which is an online publication and community highlighting how open source is having a positive impact on our world hey i'm a self-proclaimed civic geek and never work for government but want to help make government more efficient better more engaging and so i'm in charge of one of the captains for our local civic tech group called open rally brigade i'm a co-chair for something called nc open pass which is a civic tech event series that our brigade organized and i'm also a national advisor council for code for america if you're not familiar with code for america it's just a network of thousands of people working to improve government services and so today i'm going to talk about open data um i'm curious to know by chance how many people are brand new to open data i don't know much about open data okay 70 ish percent how many people have actually had an open data set and used it for a project or used it for research awesome great well for those of you not familiar with open data the highlight of the definition is that it's um machine-readable data that can be freely used and be distributed to anyone as long as you give attributions and share under the same license but once it's there it's fairly renewable i think every time someone's driving a car they're burning a dinosaur so at least that's all my kids and open data is plentiful right we've got a lot of open data out there sometimes open data is overwhelming and we don't know what actually what data we actually need to use to accomplish what we're trying to to do and so i'd like to start things off today by sharing a little bit of my open data journey back in june 2005 i was at um i'm from rally north carolina i was at a rally city council budget hearing because i geek out like that um and i went to them and i said hey i found this website called chicagocrime.org and they're listing crime data on a map so i would imagine that most of your experiences with open data may have started with points on a map and i got really excited but what was really cool about that experience is that the um the public public information officer for the police department came up to me after my little spiel and was like hey we should meet talk more about this and a few weeks later i sat in his office and we went through and he said you know we've got this system on the back end called comp stat it actually has all this stuff in it already and we just need to sanitize the data to make sure we don't have anything out there that's not supposed to be out there anything that's personally identifiable and so literally within a few weeks we had the first version of the rally crime mapping and as i was doing research for this i um i was like this looks a lot different than what they had before they've actually um it looks like they've partnered with or are using the service um crime mapping dot com where they're collecting all other municipalities who are using the service so you get a broader picture of different crimes happening so officially kind of maybe my first uh foyer request freedom of information act um but at the time i didn't know what those words meant so it is what it is and so i was like how does open data affect my personal life and uh i wrote this article for open source called surfing the open data wave and i wanted to see how noah weather data was being used for surf forecasting and so i researched three different websites the first one is surf line a bunch of folks from california um made this they uh they put their kind of proprietary spin on a bunch of open data and they charge subscription fees for it they also have a vast network of live cameras where you can check the surf anyway any other surfers in the room or am i just all right i figured i'm in california there's gotta be a few uh notice i strategically used southern california surf locations uh this is uh swell info you can see they're taking a little bit different approach um they're giving you a nice swell chart and color coordinating it with the winds uh based on how that would how the winds would impact the surf conditions and they're actually giving you a little bit longer outlook than uh compared to swell com and then there's a one out of the uk called magic seaweed interesting name um but they also have kind of a different way to present it and i also threw in the north carolina one here so that you know that we actually do have some some decent surf out there the takeaway from doing this research and writing this article was that it's a great example of how the same data set can be um can be interpreted can be visualized and can be displayed in a bunch of different ways you can see that all three of these examples look completely different but are all based on the same information so my personal passion for the last eight years or so has been to improve the citizen experience through civic tech and open data um i found my superpower which happened to be community organizing and so that's why i really got involved with establishing our local civic tech group um uh basically you know i believe that the only ones who can change government are us and i would imagine that the reason why many of you are many of you are here in this room is because we see flaws in the system that we would like to correct somehow within our superpowers or maybe we discover our superpowers so i joined Code for America as i mentioned at the beginning of what they are they have over 70 brigades throughout the united states your local if you're local here to the los angeles area there's an amazing organization called hack for LA that you can get involved with so as i do research for this presentation i found a bunch of different sources throughout the over the last 10 or so years i found an article by a company called cicrata back in 2015 called the economic impact of open data and um the author tim kashman basically says uh the total economic value for gps-based products and services is estimated to be over 90 billion dollars annually so gps is something that we have in the palm of our hand as we look for restaurants and get driving instructions and then the NOAA data that NOAA weather data i mentioned earlier is having according to 2015 numbers an annual impact of about 15 billion dollars so then i wanted to see like well how what kind of companies and startups are using open data and i went over to the open data institute and they have kind of a rolodex of about 46 or so startups that are using open data we've got folks like crop base they're helping farmers make smart choices around crop production we've got flood sensed they're sending out flood alerts from their iot flood sensing devices open bank project is basically working on an open source banking platform this piccolo is an open utilities company working on building out a smarter energy grid and then skipped over mastodon c mastodon c is using data science to transform public services such as education hospitals and businesses then if you go over to um how who's this organization um oh feedback the uh this is a list for a call the open data 500 um from gov lab and i just picked a couple that you might know and a few that you might not know but granicus is a really cool organization there um really involved in citizen engagement um legislation of efficiency and government transparency and so they've got a bunch of different applications um that can easily roll out to different municipalities i'm sure most of you have heard of map box but they're an open source platform for designing and publishing maps and geographical data um sass is actually a company local to where i'm from in north carolina how they're the next town over they're doing advanced analytics and data management um but most of the stuff they're doing is proprietary but they are consuming open data i wish they would produce more open data so we'll we'll see what we can do there uh c-click fix is uh hopefully um do they have c-click fix here do you know no well you don't have c-click fix here in la but if you did um it's basically an open 311 service and you can report non-emergency issues to your town such as potholes and graffiti and back in raleigh that basically on the back end opens up a service ticket and then hopefully those problems get fixed um i used the help to find lunch today pretty cool um my county government actually worked with the help to develop an api to bring in health data scores to their um to the app because who in their right mind is going to go look up a health data score on the public open data portal uh when they're using yelp to find the places they want to go so if you come to it is that is that live here too do you know yeah so if you go to yelp you can see public um inspection scores for the health department which is kind of cool this is uh who is this organization uh this is from the um the center of open data enterprise and they according to their data they have found that there's over 90 countries using open data and over 1600 organizations so this map that they've built out includes organizations that include companies nonprofits academic institutions and developer groups um and how they're using open data to conduct research improve strategy whatever uh let's see the folks over at open data soft have a lot of great material on their website um i'm not advocating for one over the other by the way but they have found that there's over 2600 open data portals out there i think this was these are numbers from last year so that's a lot of open data portals i've got some friends over at a place called city life um there they have a platform that empowers cities to rapidly deploy um at affordable for government affordable maps uh mobile apps that engage citizens through a variety of tools um you can kind of see a couple of the listings here not only are they making open data easier for governments to consume and the citizens that they serve but they're already using things like artificial intelligence they're starting to integrate blockchain into their platform and they're also working on voice technologies like alexa so that you can say hey alexa when is my trash day and you would get a response based on your location um or based on how they know that what trash day you are they are at appcitylife.com open data has this huge ripple effect when government departments open up their data not only does it hold them accountable for kind of maybe their actions or what their purpose is but it also allows other departments to consume and use that open data this can have a really high impact on increased efficiency and it can also encourage other departments who maybe don't have open data to get on the bandwagon and get more open data out there so a couple of examples of this would be parking decualization can help uh police departments know which parking decks they should monitor more based on capacity and based on how full they are or maybe school enrollment would like to use open data to help um help libraries pick more relevant books based on enrollment data. McKinsey uh one of our favorite consulting firms uh estimated that the economic value of open data in the united states is worth more than three trillion dollars annually in the same article um on this report they also estimated that open data could increase the value of the g20 countries by 13 trillion dollars over the next five years those are some really big numbers went over to data.gov and they've got some interesting statistics over there the value of federal open data in the united states is estimated in the hundreds of billions of dollars no surprise based on the previous numbers uh the u.s. department of commerce uh calculates that internet publishing consulting and market research firms um use data to generate more than 200 billion dollars in revenues each year and these are really big numbers and then as i since we talked about gps and weather data earlier um combine that with census data and health data um you know the the numbers are just growing it's really amazing to see how what an impact open data is having so europe is a really big fan of open data as well um they actually might be a little bit more ahead of the curve than some of us over here in the u.s. but they uh let's see this is the european open data portal and they have some estimates that um that open data has a 325 billion uh euro billion euro market size they think that by 2020 that they'll have over 100 000 jobs that will be related to the open data field they think that the public administration the eu 28 countries can save more than 1.7 billion euros they think that 10 000 lives can be saved by having quicker response times due to open data and that we can save approximately 2500 hours looking for a parking spot who's excited about that um one of the projects that my brigade got involved with uh last year was a hurricane response information and so this is an example of crowdsourced information and so what we did is through the help of code for america set up a website that could help people find hurricane shelters and uh this is like impacting people because they have to evacuate their home and they've got they need a place to stay if they don't have friends and family nearby and uh what was really interesting about this uh we had a uh getting the technology set up was really easy getting the real-time data was really hard um the fema and the red cross were i think updating things every 12 or so hours and so we had situations where people were going to shelters that were being closed because the because they were in the path of the hurricane and so we had volunteers from code for america from i think uh over 14 brigades that were literally calling shelters to get real-time data and they were entering that data on the back end so we were getting information faster than fema and the red cross um which is great in the moment but not great for the long-term sustainability of what we're trying to accomplish here um and so that was really cool this is uh florinsresponse.org um i think the the better part of not only that we're helping in real time at this point and volunteering our time to do this is that this effort actually led to conversations with both of those organizations we found that there's really no standard there's no standardized way to capture someone's information so maybe we can create a standard for that and maybe we can actually how can we leverage volunteers in times like that so that they can actually use more official systems than this so i think the simple act of government's opening data um governor excuse me the simple act of a government agency nonprofit corporations and any other organizations delivering open data is definitely adding to the total economic impact um in our economy but opening the data isn't um may not just be valuable on its own right the the latest trends here are to share relevant data that's important to people right we had a discussion in the previous session around um do you have api through your open data portal well maybe your consumers of of that portal don't need apis and maybe just having the data available is just enough a couple of examples i was thinking through is um how property value in traffic data is impacting the real estate sector or how local governments or advertisers are using census data to make more effective decisions and the projects they're working on wow i'm gonna do this really fast i'm sorry well plenty of time for questions uh in conclusion uh open data is contributing billions of dollars to our to our economy right having a huge impact so we're not only seeing startups use open data and consume it to to build their businesses but we're seeing larger organizations and businesses use open data seeing academics use open data for research um and many many other organizations depend on open data for their livelihood um governments are not only producing open data for transparency but it's uh they're increasing citizen engagement uh and they're increasing some of the efficiencies that they're seeing through their work and finally you know open data is impacting our everyday life right whether it's saving someone's life getting them to a shelter that's not closed or saving them time finding a parking space i think the possibilities are endless right this what we're seeing is the power of open data and i would just encourage you to join me in being an advocate uh encourage you to maybe find your local brigade and get involved somehow um and basically uh using open data for good so that went a lot faster than i thought but uh plenty of time for questions and that's uh my slides are available on github uh if you want to go through i've got all the speaker notes on there with all the references and sources so uh we're gonna uh since we're recording um nina has the microphone and i think we've got a question over here so can you raise your hand excuse me i wasn't at your first presentation uh but i am interested in this and what i'm wondering about is uh for the person that is very new to uh to this source of information um where uh or how would you define the basic fundamentals of open data what organizations are are doing the collection uh is there any type of financial reward for some of these are people like well you mentioned the surf site and i i deal with a couple of wind sites and i know that they're always trying to put a price tag on stuff like this but i think what you're talking about actually comes before that and that's what i'm interested in in finding um the locations for us should i go to the code for america site and then start entering search words or what yeah so um that's a lot of question but i'll summarize it as if i needed to start with open data where do i go um and so i would start it really the the answer is it depends on what you're ultimately looking for right so we have open data at various levels in our government right so we have um local municipalities uh the previous talk was from beverly hills they're they have a beta portal that they just launched um and so they've got 18 20 data sets that are out there um so start local right then there's sometimes county uh county level open data portals there's sometimes state level open data portals and then data.gov is the federal government and that's got over i think 280 000 different data sets and so that's why i mentioned earlier sometimes there's an overwhelming number of data sets that you don't know where to start i would say in my experience a lot of the open data program managers or a lot of people in charge of the open data portals really want you to use their data so i would encourage you to reach out to them to see you know if you have a specific question on where to get started if you're looking for a particular data set that doesn't exist you can request for that information uh be posted so start local and then go your way up or make some new friends in this room and they can maybe help you get started any other questions over here so uh it was alluded to by the previous questionnaire but something you said in your slides is yelp is using open data as part of their platform so they are monetizing open data that's got that government sources are producing what are your feelings on perhaps yelp paying for that data since you're using it for commercial applications as opposed to a school or you know something that is also a government run organization using that data to further the public good right they're they're privatizing public data and using that as their platform seems a little aggressive yeah i mean so you're kind of asking about the monetization piece of it right well i think the economic benefits that we're seeing like if we started charging for open data um it could have an impact down the line right and so there's i think there's been some discussion a couple years ago around should NOAA charge for data for the for the weather data and so i just think that those costs would you be passed on to consumers and so or make it make the entry to markets like that harder right so to go back to the surf example yes they're monetizing that data they're taking data that's freely available and they're building a platform around it and they're running ads they're selling swag they're selling t-shirts whatever and they're that's they're they're building their business based off data that's freely and openly available that in a sense like as we were talked earlier that we paid for in the first place right and so should all should all research and data that's publicly paid be then publicly available that's a great conversation for us to have i think it's kind of on a case-by-case basis right um at a high level i think if we're paying our taxes and it's going to create software or to create open data we should have access to that obviously following the rules of open data so meaning no personally identifiable information can get out there does that help neena's getting some extra i was going to start my talk by saying we went out to lunch and had some dim sum so if i start falling asleep and getting the food coma but we're neena's getting her exercise after lunch uh yeah i just want to know what the freedom of information act is and if we can use it to gain access to more data i mean how does it work sure uh in my novice opinion i'm not an expert on this but it is uh and anyone can correct me if i misspeak it's our ability to go to our government and request information um there's i think different levels of what you can do but for example if there was a data set that you were interested in you can then go to the whatever department has that that information and request that information be put out to the public vicki i'm looking at you do i get that yeah so the foyer one i'm going to try to repeat some of that for that the foyer one is the federal level one uh california has a specific one for public information uh and mud rocks that you said muck rock uh tracking a lot of the the information requests but we have a hold on we have a question yeah so what do you think can be done to increase financial incentives for organizations of various kinds to produce really high quality open data sets which can be really expensive that's a great question i don't know if i have a good response for that um as far as motivation is concerned i mean money's always a great motivator uh but then you know we're seeing a great trend and kind of like b-corp type stuff of you know just building companies for good um yeah i'd have to think about that one so sorry i don't have a good answer off the cuff and you wanted to you had a question about formatting what kind of format can we expect uh to find this data appearing in yeah so i open day is going to come in it can come in a variety of formats um it can come in a format as a csv file um they can put integrate it with map data uh i am actually not an expert on the different format so those are those i'm just kind of giving a very high level of what that is um i like to think about it as spreadsheets so think about open data can kind of looking like a spreadsheet um but can be consumed maybe different ways if that if that's helpful with your experience with various data sets can you comment on the variety of currency and accuracy of the data that you're getting is it maintained uh frequently or do things get out of data things get stale great question um it's really going to depend on the data that you're looking for so it's from the government of course it's accurate right um i'm just kidding uh i think um most organizations that are producing open data want it want it to be as accurate as possible um you'll have some data that is updated daily hourly just depending on what type of data it is um you have some data that is kind of maybe a one-time here this is available for you to use uh it's really the full spectrum you'll see just a variety of um it really depends on how their program is organized uh and how often they want to provide the updated data and actually a lot of times it comes down to how much resources do they have available to keep that data updated i think that's our biggest challenge is we've got some open data program uh open data programs just have five six seven people on their staff others maybe have that one champion that's like i got i really got to get this out there because i just believe that this is the right thing to do and i want to help you know help people get open data that they're that's providing um so it's really the full spectrum um and it's it is that is that's the answer right so the full spectrum all right well um if i had a half an hour slot we'd be right on time uh but we'll give you guys a little bit of coffee break or whatever uh thanks for coming appreciate it i'll hang out i'm here till sunday so i'll be around all all of scale um and uh thank you for coming today i also have a few copies of my book so if you are interested you should run to the front one two three find our seats we're ready to get started again with our open government track thank you all so much for coming to this track it's the um first time in what did we say it was like 13 since 4x and scale 4x that we've had an open government track here at scale so we're really excited to get this started again um and our next speaker up here is patty jula she is a gis supervisor for the la pd and she manages the gis database for the 911 software application and develops etl processes for a mongo database she earned her masters of science in gist from usc and uh has worked as a teaching assistant for coding bootcamp and patty has been automating processes with python for nearly 10 years and has more recently delved into data science all right take it away patty thank you nina it's an honor to be here oh gosh let's see if you like what i'm going to say um so yeah i think my talk dovetails really nicely with jason's talk previously because it's looking at actually using the data and yeah as nina said i've been working with python since like 2010 and i realized i love it so so i'm gonna be running code through jupiter notebook for this talk are people familiar with that jupiter yeah awesome and i just have a few slides to get started pondering public police data with python and just to reiterate yes i work as a gis supervisor with the la pd i've been there about almost a year and a half and the big part of what i do is like nina said i work to provide the gis database for the 911 system so when someone calls in that the address can be located if they don't have like location technology enabled on their phone so that's that's a lot of work but i like it i feel like hopefully it's for for the good um and i also work using pi mongo everything python with me as much as possible to removing one of our databases to mongo and i work with some wonderful people on that and yeah just learning more about etl processes and yeah and then the rest of the time when i have time i do data science which isn't always very much sometimes but since i had worked as a teaching assistant last year that really gave me a good like brain download on all these latest technologies so that's why i'm using jupiter notebook today probably all right so this is the boring stuff but this talk references methods of analyzing public police department data well that's not that boring okay is not to be interpreted as the best or only method of processing this data and i'm just to be clear i'm not here as a spokesperson for the la pd i've only been there a year and a half or so so they have a lot of other people that can better speak uh to la pd so just me all right yeah so we're just going to run through uh the goals of the talk which is going through data access cleansing data wrangling visualization and i'm not going to touch on machine learning too much because of the inherent bias in a lot of those systems i'm not doing that today sorry if you were looking for that but yeah then we'll just get into some examples yeah so data access that's the first part of the process it'll this talk uses open data and also data from geohub and i should say i'm i love the la pd data on the geohub for la city so just so you know what's nice is that uh governments a lot of times are loading data onto open data uh this will be working with csv files and is it always true though that the data you need can be gotten from open data is that true true false false you guys are the best yeah so sometimes you need to do web scraping and um i do that with a little project i help the media relations division what's at my department i do some web scraping and i do it for like other stuff too uh it's uh using beautiful soup or selenium y'all are you know about that those are good libraries so anyway but this talk is just focusing on open data and a little bit of geohub data so the first the first few notebooks we will review are um going to be dealing with data cleansing and the example we're going to look at is removing duplicates another thing you might have to deal with is like thinking about no values or missing data like how do you deal with that i mean a lot of this analysis kind of stuff you have to make these decisions and then just sort of share that in your results this is what i did are there any other data cleansing examples anyone can think of or that they deal with okay dirty data that's not been well rendered yeah you're reminding me of um i worked a little bit with this neural network to predict hand sorry to predict words from handwriting and i guess some of that data was kind of dirty because uh it wasn't legible what that data was but i'm sure you have more examples too yeah and then we'll get into data wrangling which is shaping the data for analytics and this is really an iterative process as you're doing it you're always revising how you're thinking about the data you know i find a lot of times at night when i can't sleep that's when i'll have a good idea on another way to approach what i'm doing whatnot so these are i just threw these words up here it's kind of random looking but it's dealing with merging and pandas which is sort of like comparable to joining and sequel normalizing the data that'll be good and then indexing other statistical stuff you might do cool so these are some of the libraries um this refers to map plot lib numpy seaborn pandas all right so this is one of the plots that is created this is okay it's a little misleading it says sfpd and lapd public data that's what i'm going to look at in the jupiter notebook but this plot here is flex crime counts by months from 2010 to january 2019 and on the x-axis are the months i know it looks like super busy but we're going to see different ways of visualizing the data but yeah this is a different crime counts for these months for these years rather you can see like a dip every every year here does anyone have an idea february very good yes exactly so yeah it's a shorter month so that's what's going on and i know it's well i can't even hardly see it but 2019 january 2019 is about in the middle here so i think that's good all right so let's get started this is a github url for the data that i'm going to review in this talk but you have some work to do see the data i loaded this from open data sources it was too large to like load in the the free uh i just use a free github account so they said it's like it's more than 100 megabytes so you're gonna have to download it you have some work to do as well so yeah this is it uh let's review the read me oh that's good caution it's not to be this is just repeating what i said this is representing a modern technology technology including jupyter notebook and python libraries uh i can't really see the screen super great but okay that's better yeah there's some uh yeah and i say install anaconda 3 and typically jupyter notebook installed with anaconda and i'm running the code from a virtual environment i don't know if you've you've seen other talks today they talk about containers and whatnot but i i like virtual environments it seems to work well so i will open that up okay so this is a repository on my machine i'm using ubuntu and it also works on windows i use windows a bunch as well so i'm going to open my terminal yeah and then i'm going to save source activate i call it mar talk okay so i activated my virtual environment you can see it says mar talk here instead of base so this is my virtual environment it has all the libraries i'm working with for this talk so now i'll say jupyter notebook there it is all right all right so we'll start with this data cleanse example and uh just to repeat myself this is using san francisco police data uh it's on incident reports and i have here in this read me yeah you you got to download the data from data dot sf dot gov dot org police department so the link's in there so that's fine and then just that's this csv file right here looks like i haven't touched it in the past 17 days that's when i guess i downloaded preparing for this talk so first thing we're going to do is find the duplicates and this is just to confirm or this is sort of just to check to see like are there duplicates but because before you're going to do any analysis with data you want to sort of see like you want to get under the hood looking for duplicates is one way to do that so open this up so this will take about two minutes to run i'll just point out some things going on in the cells but it's pretty cool run all so yeah it takes about like two minutes to run at my network at my house so i don't know it's just importing the libraries finding the variables dataframe dot head printing out the top by default the top five values are produced and we'll just wait uh you see this so it's actually running when you see this yeah so i want to look for duplicates by incident number and then i'm using the set constructor on that field and then i'm saying if there are our instances where that field value i mean it should say field value it's greater than one it'll say found duplicates and it writes it out to this file out dupe csv file and then this will be pretty cool we're going to check a given instant id so we're just going to check in this input csv file if there were any duplicate found so once we get there we'll see what it does and i also want to point out this oh no it's in the next script so i have nothing more to point out i guess all right it's done all right cool so here we're doing i'll just run this so checking by given incident id so this number isn't really important but you can see there are two entries for this phone calls harassing and they were the same date and time so i'm like okay well that's a duplicate so are they zero yeah no entry yeah you're right but they have the same uh incident number yeah that's a good question and my answer is i don't know that i'm that's possible that is definitely possible i think it's also possible that certain departments will go through and revise incidents so they they keep the same incident in there it'll be in there twice or maybe three times so now i'm going to open up duplicates remove and this this runs pretty fast yeah when i i wanted to find out this drop duplicates this is a nice pandas command when i first wrote this script like a using the csv library and i had it at like 10 or 12 lines of code it was super complicated and then just a few weeks ago i was like there's got to be a simpler way because i was like looking at it for this talk and i'm like i don't know how to comment on this thing it's like but then i found this drop duplicates command and i'm saying to keep the last entry i think the default is first for drop duplicates is one right or wrong i don't know you have to make that decision in your analysis so i'm saying to keep the last entry all right it's not running so i'll just go back to this last cell this is the same incident you can see it's only in here once phone calls harassing so i have another one i want to review so this is in duplicates fine i'm running this last cell again and you can see there's two entries for this one it ends in 448 it says theft from locked vehicle open or active you see how there's a lot of null values not a number of values here so that's why i'm thinking they probably went back in and revised it and gave it more of a location so if i copy this it's 448 value so we see there's only one entry in the cleansed CSV file and it does have the latitude and longitude because i specified to the last so data cleansing how about that yeah and just to review the the output it went into this um resources folder so three and four minutes ago these this is a clean CSV file and this is one with all the duplicates just because there might be some finer tuning you need to do and looking at the duplicates so i just uh i produce both one that's clean and one that's clean meaning no duplicates and another that contains all of the duplicates so you can really hone in on that if you want to so great that's a first bit of code i have i think it's cool clean file oh okay so that must be because there's more like there's a lot of entries that aren't duplicates that are just a single entry in the source table so let's say there's 20 entries in the source table and 12 of them are entries that are in there twice but then there's out of eight so there would be 12 in the duplicate table but then six eight fourteen yeah does that make sense okay you don't look convinced how can we convince you yeah yeah that just means that most of the entries aren't duplicate entries there's only a single entry for them so i put that all in the clean i didn't say it's not clean like only the single entries for all the duplicates it's single entries for all the data regardless of whether it's a duplicate or not you had a question no you're good yeah did you have a question anyway no yes yeah that's a really good question the question is how do you start with like figuring out how to do like data cleansing and i mean i wouldn't really say i'm an expert at that i think i don't know how how i figure that out i think it was just for me i was looking at another data set at work and i was just like just trying to like think about it and i was like well are there duplicate entries or not so that's sort of uh how it took off i remember uh there was a data and donuts meeting a few weeks ago and someone said it's to look at data cleansing or yeah a data cleansing you can look at the rates of entries so if you had a data entry field and you saw there were no entries for like 24 hours when usually there's like five every minute so that would be another i thought that was good i was like i gotta do that now but just haven't yet so yeah you just sort of are thinking about it and i guess just take it from there yes dot bat d okay so csv files are very popular with um i got this data from the sf open data and i just downloaded it as csv and the pandas library i'm sure some people here can back me up they love it loves csv files so it's structured data and uh that's kind of like i don't know they they just work very well together i don't use that very much personally um i i feel like that there's a certain amount of like bias in a lot of that kind of analysis and like people don't know how to fix it so yeah like have you heard i was reading recently about someone predicted that president trump like some model predicted president trump would be president just because his face looks like previous presidents or are you all familiar with like the iris the iris flower uh analysis with um yeah like with neural networks well they the basic idea with this iris data set it's kind of like the standard when you're learning you can predict the type of flower by the length of the like the sepal or the length of it doesn't really matter the length of the stem you should be able to predict what kind of flower it is the thing is people not good this is based on someone's measurements do you think people are always very accurate at measuring stuff oh they're not so it's like when you like attribute that kind of stuff to people i uh have a lot of it i feel concerned about it so yeah all right so this cleansing it's fun i like it it's not super fancy so now we're gonna look at in this 01 folder um this is looking at serious crime and we'll look at how i'm defining that 02 serious crime total all right so first we're going to look all right wait open this read me yeah so this is again uh you would have to download this data it's not that bad just download it and then you put it in the resources folder so i have more files here so but don't worry yeah so i got that four days ago from la pd's public open data site so we're gonna look at crime by area and area i don't know if i'm saying it right but area and division are like kind of interchangeable at least in this example and there's 22 areas for example we would be not too far from the northeast area of the city but then there's hall and back southwest southeast i can't name them all not right now um yeah so for this one this takes a few more minutes to run some kind of cringing about that but let's just run all there's another thing i have to tell you you're gonna have to get something else but we're just defining the libraries the variables doing this data wrangling um where i'm working with parsing dates from the date occurred field which is one of the fields in the open data there's date occurred and date reported which one are you supposed to use i mean i don't know it depends on maybe that's what i'm using is it right it's a decision all right so i actually in the next notebook i have this in bigger font so i apologize but i'm querying for a crime code description i'm going to talking too fast do you want to look at the df dot head on this before we look at the query is that better yeah all right well then we're just gonna hang out here for a minute yeah so the question is a very good question i'm not really qualified to give an answer it's about part one and part two crime yeah i think if you're gonna do any analysis on like that reclassification if i was gonna do it i would just look at both ways it's classified just to sort of start seeing if you can pull out trends i don't know if that's not like a super great answer and i'm not sure i have a better one at this time but yeah yep all right it's still running yeah yeah they definitely are and um definitely aware of that and i mean yeah you know what that's a good point probably a different way to improve this would be to look for records where it's all null or um we so um sometimes when records are entered if they don't have the person's age they'll put zero and so my favorite entry and this is just like i'm only talking about the publicly available data but i saw like a two-year-old had stolen a shopping cart and it was just the funniest visual because i don't really think they did you know like man that was a bad baby so i think those records um could be thrown out oh right i'm not ready to show you that all right let's go back crime by area right so yeah this is the data um you'll see there's a latitude and longitude provided and that's generalized to the nearest uh intersection just to protect people's privacy so if you see that that's not actually landing on someone's house and i think it says it says that in the read me but yeah so i'm carrying out data that is coming from 2010 to oh 12 31 2018 and it contains uh these pipes are just separators with pandas so if the string contains aggravated murder rape burglary larceny vehicle theft and a couple of entries then that's that's what i'm including in this analysis got that aggravated murder rape larceny kind of important stuff not not that all crime isn't serious but all right this is just like data wrangling well we're so self-conscious showing people my code i'm like ugh but anyway so then we have this file and we're seeing um this division or area as i call it in areas per square mile and i did that just the gis software i got the area this is more wrangling so yeah this is fun when i first joined these two okay so i got the count by area name for these different areas and i joined it with the csv file i had which shows this area in square miles of these different divisions the first time i did it well i didn't have it specifying that it's uppercase so then i was like really freaked out because it was total so you just have to be careful about all this stuff so i'm saying string to upper so it joins correctly here so i don't know just want to point that out yeah so this is a select crime by area normalized per square mile this is from 2010 to 2018 in los angeles so this is like decently difficult to look at then i said it as a plotly plot and um in the repository you're gonna have to get your own username and api key that's just how it goes i'm not i didn't share it so yeah i divided the count by the area in square miles and just rounded it that's why they all ended zero yeah i'll run this i think it already popped up but we'll just take a look yeah so this is it in a plotly plot so it's a little nicer this kind of visualization right because you can actually scroll over it i know you're like i can't say anything from where i'm sitting but yeah this is it this is that first chart as a plotly plot cool so i'm going to close a couple of these and we'll go into the next notebook okay so we'll go into series crime and this is going to harken back to that line plot that we saw in my um powerpoint and here i have i was trying to think of it what's it called in candyland you had those shortcuts where you don't have to go all the way around what's that called bridges yeah great that's a good word so i built a little bridge in this we don't have to go through all the cells because yeah these are like more heavy duty scripts so i'm just going to very carefully because if i start running it no bridge for us i'm going to run the first few importing the libraries yeah i just have this here printing the versions of plotly and map plot live because these are in my estimation these are rapidly developing libraries you want to make sure you're using like the current libraries and then i'm just defining file out data and then i'm skipping again these are the codes i have it in bigger font on this aggravated murph rate burglary etc we can ignore that so run from here so this is a simple chart you don't like have to look at it this is just trying to show you the initial path and these are my comments it says x's month y is count hue is year because i was really i don't know just really trying to grasp how to make this kind of a busy line plot because i knew i wanted it but i wasn't sure how to do it and then so this is just my comment on okay hue is year oh no it works cool so whatever so this is cleaning up the plot so run this cell all right so this looks this is a little better i think it's a lot bigger colors are a little more different and then i'm going to set as a plotly plot all right here it is in jupiter notebook it's looking a little stretched here's 2019 but it's saying like underscore line seven underscore line eight instead of the year so it's not too good we'll run this last one all right this looks cleaner just hold off on that previous one i'm going to run a flask app to show it looking a little more cleaned up this is select crime by area from 2019 to january sorry 2010 to january 2019 in los angeles so this is less busy yes a larger spike what was that that's november i don't have a good answer but you're making me think about my friend was robbed someone came in through her shower window because she like lived in a basement and it was like right around the holidays so someone stole her maybe it's a holiday thing but i'm just conjecturing yeah but what i find interesting is like why what was so good in like 2014 that's like really low comparatively and i do have january 2019 so you can see that and i also i just just one more thing before i forget i'll come to your question but do you remember reading like that mayor garcetti and chief more that said that crime is down from 2017 as compared to 2018 do you remember reading about that i was reading about it but that's what i saw as well for this serious crime and i'm not saying i have like the exact same analysis as they did but yeah so that's good uh yes that do what yeah so i'm working to like kind of create that position for myself but i'm not there yet um they definitely do do analysis of crime uh everyone in that department is interested in that kind of stuff um but i don't really have like a ton of details on like how it's done specifically just because like i'm a civilian i don't you know kind of on the outside with some of that stuff but it is yeah i i think this is how they should be doing more of that analysis like this is the right way some of the technology they're using is kind of old so this is like better all right now we're gonna look at this in a flash cap and then i'll just have like one or two other things to say all right it's just running on local hose yeah so this it doesn't it's not fitting crazy go to blame myself for that but you can see the months as well so it's kind of cool and yeah 2019 is the yellow dot there sure i gotta clean up the visualization even more but yeah that's kind of these are some examples of analyzing the data and i mean would you just to step back a little bit but it's like yeah it's cool to have all these charts and plots but would you want to actually create a different 100 different charts for all the different ways you could look at the data like i wouldn't so then the next step would be like actually calling from a database so you could query by area looking at select crime and if i'm invited back next year that's what i'll do i'll have a database put together yeah yeah so the question is is yeah this is just crime data there's also a publicly available data set on arrests but i'm not sure we have any data on convictions i just i don't know but i'll put on my email address and if you can find it i will let you know yeah yeah you're right there's definitely more to this story than just looking at it like by division and it's also another way to normalize like a normalize the data in that first notebook by area but it also be something could be maybe done from analyzing by the population because there's probably more people living in holand back than um i don't know west l.a. oh no i don't know it's panga maybe so yeah yeah that's a good point any other questions yeah yeah i know i know yeah so you know in my effort to make this kind of particular analysis more prevalent i'm i was showing some of these plots to my co-workers ones and they were like can you just put that as two bar charts i forget what the exact request was but they wanted two bar charts for like 2017 and 2018 and i'm like that's pretty simplistic but i i don't know so yeah no i definitely agree i mean there's these even like this is line charts bar charts and nothing scatter plot nothing fancy but then people ask me just for some totals for two years i'm like oh great that's fine it's helpful so yeah yes yeah no i don't know i'd say a little more yeah yeah i mean i'm honestly at the point where i am it's pretty like general so yeah i do think normalizing is important though to just kind of get a better sense of like the actual area that you're looking at um yeah it's pretty general what i am doing i mean it'd be good to make like a bell curve and just look at the distribution of the data that would be good as well but yeah it's pretty general yes yeah so the first thing i can say is if you want to do data science like it's like super popular everyone loves it right now i would start i mean i started with doing automation and then everyone's like python school and i'm like what like i've been doing this for years now and just because i've worked at like all sorts of city departments and they they all need automation let me tell you um yeah so i will say the course where the ta is called trilogy um trilogy education services that's a six month course it was three days a week pretty intense and i also on my repository i put up like i really like the o'reilly books and there's a um like sort of a self study course on python three but i think like once you come up with a question and let's hopefully you can find the data or it's not too hard like accessing the data you know is a big part of it so if you either find a throw open data or web scraping then come up with a question you just kind of like it'll just take off i think yeah plotly okay so i'm not a pro but it's uh it's sort of like a way to make it more interactive visualizations so when i was scrolling over the numbers were displaying and you can also like choose if you want to display a single point or all the data at once does anyone else have anything more to say about it's pretty new plotly anyone yeah html yeah yeah it does and you get a javascript file um it's still pretty young i think so i look at the documentation i'm like why is no one asking this question but that's also kind of cool so oh right i just wanted to put up this slide yeah so um my email address at work is horribly personal but n5875 at la pd dot online so that's where you can reach me and if i can't answer it i can try and find someone who can and that's my github i'm on social medias yes and another lifetime i also work for the federal government but i'm not really interact interacting with them but maybe i can meet them at some point yeah thank you thank you patty and our next talk will be up in about 12 minutes pretty quiet yeah i am talking quietly what if i talk louder louder louder yeah no that's good that's good it's working yeah all right welcome back uh we have our final session of the day for the open government track thank you all for coming here we have the open precinct project team up here and i will let them introduce themselves hi everybody um the title of the talk is open precinct a collaborative resource for redistricting reform we wanted to put gerrymandering in the title to really drive the crowds but then we saw that all of the talks in this track had alliteration in it so we stuck with the theme okay so um we are the Princeton gerrymandering project and you're probably wondering what that patriotic bug is uh well we like to say that gerrymandering is a bug in american democracy and we're going to squash it uh and today we're going to talk about a data project that we hope will help us do that so our director sam wong is a professor of neuroscience at princeton but he also studies elections and after 2012 he and some others noticed a pretty strange result there was a major distortion in the relationship between seats and votes in us house elections so i'm going to plot a series of us house elections on this plot on the x-axis i'm going to plot the national vote share that democrats won higher numbers to the right being more democratic and on the yx i'm going to plot the number of seats in the house that democrats won so if you plot a dot for all house election each house election between 1948 and 2010 see they're correlated that's good um and now if you plot the four house elections that have happened this decade which is all the elections that have been held since districts were redrawn in 2011 you see something pretty interesting which is that these dots are all significantly lower uh indicating a new disadvantage for democrats in the range of maybe one or two dozen uh out of the 435 house seats so you can see that for all of the vote shares that they got um in elections this decade there uh are previous elections that had the same vote share where they got more seats so uh princeton gerrymandering project was started to measure the effect of gerrymandering and you can go to our website if you want you can browse through the years you can see the results of a few tests that we have up there that can indicate for which years and which states does gerrymandering seem really bad um note that this is from 2012 um some of these states some of these really red states michigan ohio pennsylvania and uh virginia have been fixed or sort of fixed by state level reforms or court cases um so for instance michigan this year voted to implement an independent commission to redraw their district um ohio passed the constitutional amendment um pennsylvania had a court case which i'll talk about a little bit later but now with 2021's redistricting on the horizon we're trying to uh switch a little bit from uh showing you know retroactively where gerrymandering was bad but instead we're putting energy into a new open data project uh that we're excited to talk about but first i'm going to take kind of a meandering path before we get to the project so first i'm going to describe uh this bug in democracy with a bit of a primer on redistricting and gerrymandering and then talk about two broad ways to fight gerrymandering namely measuring it and also empowering citizens and by then hopefully i will have convinced you that precinct data is an essential missing piece here and i'll hand it over to my colleague hannah who will talk about the meat of the project so redistricting every 10 years the census is tasked with counting all of the people who reside in the country and that kicks off a whole as soon as they release the data it kicks off this whole process where lots of maps have to be redrawn to ensure that the number of people in each one is about equal meaning that each person vote has about equal weight there are lots of maps that get redrawn during the process from congressional maps all the way down to school board city council and so on um but since we mostly focus on partisan politics there are three really important maps the state legislators draw i'm going to show california's maps we've got the congressional map which has 53 districts we have the um state senate map and we have the state assembly map and the way these maps are drawn has a huge impact on the makeup of the body that gets elected through it so who draws these lines well it varies pretty significantly across states so in a handful of states there are independent commissions composed of citizens recruited through various processes and those states are shown in green here from most reformers perspective this is kind of the gold standard of redistricting processes uh arizona was a pioneer here arizona was the first to implement this sort of commission in 2000 um california followed suit in uh 2010 with proposition 20 did anybody vote for this nice cool good job and uh and michigan in colorado voted to create one uh last november but for the most part if you see all the gray maps are done by state legislators who usually just create and pass a map like any other bill and by the way some of these gray states only have one at large district like montana so they don't actually have to draw congressional maps but all states have to draw state legislative maps so who draws those well the same states that have independent commissions draw congressional lines also have been draw state legislative lines um so again those are in green um and there are also some sort of like for both of these there are some sort of in between models like commissions where politicians appoint the uh the commissioners to draw them um but in general you can see that for the most part districts are passed just carte blanche by by legislators and for state legislative lines it actually creates kind of a feedback loop that's hard to break where legislators draw a map to try to achieve a certain partisan outcome and then if it works then they have even more power to do it again the next go around um one thing about this is uh i i know from from hannah who's a californian that california is very famous for citizen initiatives and long ballots um but in many states besides california it's there are no there are no citizen initiatives so to implement redistricting reform you have to convince legislators to do it you have to can there has to be a lot of public pressure on legislators to to give up the pen and it's really hard to do so um it's it's a pretty big lift to get um really solid redistricting reform in a lot of states okay so what's gerrymandering well it has a pretty loose definition uh it generally refers to just some distasteful line drawing practice but our concern is primarily partisan gerrymandering which is pretty specific so let's say you have a um a partisan map making body what are some options available to them uh well the goal of of partisan gerrymandering is you just take you take these votes as a constant just where your voters are and then you draw the lines to try and get as many districts as possible to vote for your party by a small but reliably positive margin so i'm just going to take this arrangement of blue and red voters as are given and show you the variety of outcomes that you can get through uh map making so first gerrymander for the uh red party tries to pack the blue party into uh the blue voters into a single district that they win by a huge margin and then the remaining blue voters are divided among three districts that go reliably red a blue gerrymander works the same way you just artfully arrange the red voters in the suburbs into one district and try to get three more blue districts um the most common kind of gerrymandering is a bipartisan gerrymander where everybody in the state legislature says you know i don't think any of us really wants to have a competitive election right and then you you jointly pass a map where everyone gets to keep their seat and has pretty safe general election uh and then you can also um try to make a map that is competitive and has in this case has two districts that go 50 50 so a partisan gerrymander aims to to skew the distribution of vote shares so let's say that um you have a fair map you hold an election this is just a hypothetical um you have a fair map you hold an election so then you have a vote share in each district which is just a number between zero and a hundred percent democrat we're ignoring third parties for the most part it makes things a lot easier um and in a fair map you might have like a roughly linear distribution like this where you have some in the middle districts some you know some democrat districts some pretty strong republican districts um so and the the vertical position here indicates the percent of the democratic vote share so if it's above that 50 percent line that's a district that voted for a democrat um but if you look at what happened in ohio 2016 you see this like really curious distribution uh this is a map that was drawn by a unified republican party and you can see what the what the aim is here it makes four really really strong democratic um seats and then keeps all the other districts pretty reliably republican so what happened in 2018 so again this is 2016 in 2018 there was this this blue wave where democrats got a pretty significant increase in vote share pretty much across the country um so in ohio i think it was about a three or four percentage point shift towards democrats and you can see that um this gerrymander was designed really well um it was designed to hold up even in the case of of a blue wave um and so even though the democrats got more votes um they didn't actually convert any of that increase into an additional seat and so democrats won the same four of 16 seats in all the elections that were held um this decade since the redistricting in 2011 um i should say by the way it's that gerrymandering is a bipartisan offense um later this month there are going to be there's going to be a court case before the supreme court about gerrymandering in mariland where democrats tried to um convert one reliably republican district to a reliable democratic district so um bipartisan but the republicans had a lot more opportunity to do this in 2010 so how do we fight gerrymandering well the fight has been raging in courts for decades and a big part of the fight has focused on how to measure it the idea being that courts need a way to say that this map is worse than that one and so was there was this high-profile case against the wisconsin state assembly map last year called gilvy wittford used a metric that you might have read about called the efficiency gap the idea behind the efficiency gap is to measure whether one party wastes more votes than the other where a wasted vote is any vote that doesn't contribute to a win and to to compute a metric like that you don't need to know anything about geography all you need is a series of numbers so like in that last one i showed you it had 16 dots those 16 numbers are all you need to there's an equation that you plug them into um and you just get out a metric with the efficiency gap so in oral arguments for this case uh chief justice john roberts referred to these metrics as sociological gobbledygook so of course my colleagues who are not here uh wrote a paper just for him an antidote for gobbledygook um and it kind of what it tries to do is sort of summarize is when these different kinds of metrics come in handy and just tried to hold the chief justice's hand and say it's not it's not so bad you know these they come you know all these metrics do different things um so there's also a really great comparative study by greg warrington at the university of bremont that takes a more quantitative approach uh computes a bunch of metrics for a bunch of elections and it does various cross comparisons and tries to find the scenarios in which they agree or disagree and that's a really good paper and we also have a python package that you can use to apply about uh 15 of these metrics to any sets of votes that you have um to see if they seem indicative of gerrymandering so if you're interested in in doing this kind of thing you can run your own metrics um but there's some issues with uh with standard gerrymandering metrics like these um even though this is a great package of course um for one thing they might have a false positive problem so let's say you have a state where let's just imagine you have a state where urban voters vote for democrats like 100% and uh people in the suburbs vote kind of mixed democratic and republican um and let's say that that state also has a law saying that a district map should try not to split cities as much as possible now you might have what you're going to have is you're going to have um a situation where it looks packed right it looks like um you might have a metric that says oh the result here seems really packed it seems like democrats are really disadvantaged even though that math was just drawn according to the rules so to to really understand to really measure gerrymandering you need to compare partisan performance against some baseline and the way you get that baseline needs to take into account uh the rules that govern redistricting the state but also the um the geographic layout of voters basically like where are those blue and red dots that I showed in the in the previous one so um that's why in recent years we've seen the development of a number of algorithms to generate large ensembles of maps that are reasonable according to some rules so here you can see a map of virginia I think we started off with the real congressional map and this is an algorithm that just makes lots of random tweaks and if that tweak results in a map that's still like reasonably good like population equality is still good it keeps it and then it does it again so this uses um markup chain money carlos sampling um and if you do this millions of times and uh uh then you you have a you can get a big ensemble of lots of pretty reasonable maps and then if you have that you can then say okay all these reasonable maps if we plug in these election results how do these maps perform what kind of partisan makeup do they produce and then you can go back and compare that against what the original map is doing and you can see if that's some crazy outlier among the whole distribution of reasonable possibilities so this is kind of like the next generation of measuring the effects of gerrymandering um and this approach was actually used um effectively in the pennsylvania supreme court uh last year so a researcher at carnage millen named west pegden uh did something like this he started with the pennsylvania congressional map uh did lots of tiny changes and found that um the original map compared to this whole ensemble was in the top 0.00058 percentile of of uh partisan bias and then he also came up with a theorem showing like what are the odds that you would get a result like this from a reasonable map um and the court was actually convinced by this um which was like uh it was landmark ruling in gerrymandering and they actually they struck down the map as an unconstitutional partisan gerrymander ordered the implementation of a new fairer map which was used in last year's elections so this was this was a really big triumph for all the the gerrymandering nerds um and a modified version of this approach is also going to be used in the upcoming case about north carolina's map this is um a slightly different sampling technique uh this plot is produced by a group at duke led by a professor john mattingly um this histogram shows you they constructed some gerrymandering index this histogram shows you all the reasonable maps um the judges map is some fair map that a panel of judges came up with you see it falls right in that distribution and then the 2016 and 2012 map that were actually used in north carolina are way out there so um this was actually used in the district court to strike down the north carolina map which is ruled as unconstitutional but there was no time to replace it so an already ruled unconstitutional map was used in november's elections actually um but this is actually coming back before the supreme court as well this month along with the maryland case and if this goes well at the supreme court then you're probably going to be hearing a lot more about this technique in the future you can also do some of this sampling yourself thanks to an open source project released by the metric geometry and gerrymandering group which is a group uh led by let it it's at tough led by a tough professor named moon duchan um and this is a really cool program and so so that's kind of the latest and greatest tech for measuring gerrymandering and uh depending on how things go at the supreme court you might see more of this um so in addition to using metrics and measurements to fight gerrymandering in court we can also try to fight gerrymandering by giving citizens the tools that they need to have a say in the redistricting process which is going to happen again in 2021 so what's going to happen in 2021 well in some states like california there's going to be this uh there's a commission and there's a lot of public input and there's a whole process about that and periods of public comment and that's great uh you guys are very lucky congrats um but in most states it's going to like state legislatures are going to craft these maps mostly uh mostly in private using expensive software and well trained consultants to try to achieve a particular objective um in many cases these maps are going to be released not long before they're scheduled to be voted on so how are voters or how are journalists uh supposed to understand the likely effects of these maps well since 2011 there's been a huge increase working of in people working on software to try to address this issue and one is this website called plan score which is really cool it allows you to upload a map file so if you go to so they have this page where you can upload a shape file that you might get from the legislature or you can click on a pre-existing plan and it scores the plan for you and it says okay how bad are these metrics according to this this hypothetical map and you can you can browse through it you can see for any hypothetical district you know at least achieve population equality like how is it predicted to vote and this is a this is a really great tool um and it'll allow people to have a much more rapid response to a map than they would have uh 10 years ago hopefully um another way that citizens can be involved um is by being able to draw um maps of their own so um there are a few pieces of software to do this including district 3 which is by the same Tufts group that I mentioned before and it has a really fantastic interface for drawing your own district and this way that the public could have could be able to say to their commission if they have one or to their legislators I don't really think that's how my community should be represented I don't think that's a logical district I have this other map that I think is a lot better and this is really cool you can you can go to this website you can and it's just like painting so you pick we're going to be painting this district here you can change the brush size and such as you're you're drawing you can see how your population equality goals are changing in real time um and you can also see partisan performance here so here we're we're looking at the 2016 presidential election how these districts be likely to vote um and so you can see if we're you know we make this district one as it edges into Philly um or now we're drawing district two as we edge into Philly we just picked up a lot more Democrats so you can see it go blue this is a really cool um this is a really cool interface for this there's also some other good software like Dave's redistricting app um is a pretty popular one and in the past various organizations have held even held contests to draw maps and raise public awareness so you can offer prizes for different objectives like a prize for the map that best achieves partisan fairness or best represents some community um and that's that's a really cool thing to do but um all these things that I talked about uh all these avenues for compacting gerrymandering have one thing in common they are thirsty for data they really need data um so uh and because they all need uh geographic election data um you'd think that there would be like a really solid resource for this some kind of national resource for this but you'd be wrong it's really really hard to get this data and that's what this project is trying to address and Hannah's going to talk more about uh about the meat of this right now all right so everyone else has been talking about uh things to do with the open data but nobody has talked about how hard well everyone has mentioned how hard it is to get their hands on this open data but we're really going to dive into that so um first uh I want to transition into talking about well eventually I want to transition to talking about this tool that we're building because we really want feedback from everyone in this room but I think we need a little bit more background I'm sorry for so much background in this talk um but first of all we need to talk about what is a precinct so a precinct is the smallest unit of geography that votes are reported in in general there's a school or a church or something that's designated as a polling place and everyone in a certain geographic region votes at that same polling place in California you know it it gets a little more complicated especially here in LA but uh around the rest of the country that's generally what happens um so when we say precinct level data that's what we're talking about uh everything that Will just described runs on this precinct level data we're referring to two pieces of a big data set we need the boundaries on the map and we also need the votes that go inside them because without the boundaries we don't have any way to place the votes on a map so a big nice open source uh precinct level data set for a state isn't really that useful without the the map so elections are generally administered by counties and we usually have to contact the county officials to get this map info and the US has more than 3000 counties or county equivalents so that's a lot of raw files that we have to sort through of course we don't have to do it for every single county in every state some states like Minnesota publish the data after each election but that really isn't the case for more states and like Will said there's just no national database for this um some counties publish it on their websites some don't even do that so it's really uh all over the place uh there's a few entities that have stepped in here to fill the void I put up here a picture of a New York Times feature of precinct level election results for the 2016 presidential election and I'm just curious how many of you guys had seen this data set I think there was a similar one in the LA Times how many of you guys have already played around with it okay that's a pretty pretty uh good size audience um and with with data sets like this you can really see how you could powerfully gerrymander when you have access to this data um but the thing is this data it's not really it's the most comprehensive and user friendly data set out there so far but it's not accurate enough for our purposes so for example this is an image that Will made where the New York Times map of Manassas Park City Virginia is on the left and the physical map that we got from the county which was originally colored in crayon by the way that's that's what I'm talking about um so these discrepancies are just too big to ignore we can't be saying that voters live somewhere where they don't um so so the other data sets they're great they're useful for people to get a feel for the political geography of where they live but it's not something that you could use in court um but there is a lot of energy out there to do this there's currently multiple groups working on this um Will was mentioning uh the group MGG and Boss MGG in Boston I was part of a summer program called Voting Rights Data Institute that they had this last summer and we collected and compiled Ohio so that's what this top picture is this is what we had to do to get that state done we estimate it took about 400 man hours and we had to have in-person trainings and then we had to have in-person work sessions where we had some of us walking around to help we had to buy lots of pizza and most importantly we had to have air conditioning when the dorms didn't have air conditioning so that's how we got everyone in the same room to actually do this work um we've also been in touch with an individual in Kentucky who's just started taking on this project for that state there's a team um at the University of Florida that works on this there's a website called the Harvard Dataverse with lots of historical data and there's a great collection of files called the NV Kelso election geodata repo that's this bottom picture and it's really been the the main source for this information so far but sometimes it has very little documentation it can often be really messy and there are also just big gaps in this data set there are states where we don't have anything so it might be surprising to many of you that this data set is actually in such a terrible state but anyone who's worked in local government probably understands why this is so difficult um and because like a local election administration varies so much from state to state and even from county to county within the same state we know that we aren't going to be able to do this alone especially since so many other groups have been trying to do this and we still aren't where we want to be nationally so we want to build a tool that's going to try to coordinate all of these grassroots efforts that are popping up all over the place and we want to lower the bar of entry to actually contribute to the project because right now you have to be fairly technical we had to have these in-person trainings and we had to have someone walking around to answer questions so we we want to come up with something that's going to make this all easier and one of the the hardest parts of this this project is just getting our hands on the precinct maps in the first place there's no consistency for the the precinct maps across counties in one state and then when you're going across multiple states sometimes they don't even call them the same thing anymore um so we've received broken shape files from county clerks hand-colored maps like the manassas park one we saw um once i got a serial killer style map that was made of like 50 pieces of paper cut into funny shapes and they were all taped together um and once in for virgini we even had a county clerk say his old boss just walked around the county a long time ago and wrote down which addresses went into which precincts and that no one really knew where those boundaries were and yes this is actually coming from a county clerk um so sometimes the the county makes you send money for the maps or sometimes you even have to go get them in person we had this fun with highland county in virginia where our legal analyst ben who's not with us today um he had to drive over an hour from Harrisonburg which was already out of his way uh down this windy mountain road to go to an unmarked double-eyed trailer where they had a single copy of the map on the wall for him to take a picture of with his cell phone and i think this is like the original data scraping now this is just like the outlier this isn't going to be every county but i i feel like in every state there are going to be a couple of counties like this you can see where it is way up in in virginia know very rural they didn't even have cell phone service in the trailer for someone to take a picture with their phone and text it to us so we had to go um so collection that's the collection part that's just the first step once we've collected county contact info we've contacted the contacted them all and then gone through whatever hoops we needed to to do to get a hold of that raw map then we digitize clean and correct these maps and then we match them up with the election results so we can they can be visualized like in that other map and we have a github repo full of all the python scripts that were developed by two undergrad researchers last summer one was jake jake of watchpress and he's at princeton and another was connor moffat from right here at caltech so he spent all summer at princeton working on this and those tools will tackle most of the the tasks you might encounter when trying to clean this data um but as i was describing the form of the raw data you receive can vary quite a bit and that's going to change what you're going to have to do there and in reality you're also collecting election returns at the same time to determine what precinct names you need to match up with and sometimes this requires collecting those election results directly from the county officials but there's actually an entirely separate project called open elections to handle that so don't worry don't let that scare you but um you also have to process the the maps differently depending on what type of raw file you get and it can even change with the quality of raw file um and then we might want to include historical elections and i'm just gonna stop talking before the font gets too small but i hope you guys all get the point that the it seems like a simple process but it's really really complicated and these middle steps get really messy um but but groups are tackling it uh and and they're doing it in different states they're tackling different aspects of this uh but we really need something to bring it all together and make it easier so we need a platform that's going to organize it all and it needs to be something that promotes cooperation across groups and teamwork within a state and it also needs to coordinate every step of the process uh which is going to be different depending on the the source of your data it has to have very high standards for accuracy because we can't have any uh errors or a case in the supreme court might be affected and we need to have it consisting of the most up-to-date information we need we need to be collecting the most accurate precincts because precincts can change between every single election and currently there isn't anything that checks all of these boxes so we started development on open precincts in collaboration with many organizations who have already been doing this work like open elections which I just mentioned plan score which you saw um will show an image of the florida election science team at the university of florida does this work and we've we've also been developing it with individuals like nithaniel calso from the calso repo i showed and the project leader in kentucky and and a ton more that i can't even think of right now so we're trying to build open precincts as a custom web app um where we'll we'll bring all of these together lower the bar of entry so more people get involved and hopefully fill this void for good um so it's written in jango and it helps organize and coordinate the process for an entire state and it also has to manage it on a higher level across all 52 jurisdictions um it naturally naturally breaks into three main sections you have the collection of the raw data which is really tough you have that cleaning and matching of the raw data which again was really messy and complicated and then you have the display of the final product which is not that bad and and that's what a lot of the the other softwares that you've seen so far are doing but they just don't have the data yet to display it um so here's the links to the github where the jango application lives and the website where you can get a feel for the tool it's in it's very early stages right now but this is where we need your help we want everyone to look at look at this and and give feedback so let's take a look at the website all right so if you wanted to to volunteer you're gonna go ahead and sign up here and I have to try to type backwards or I can log in right now so you would go ahead and sign up for a new account it sends you an email and then oh do I have to one step so I forgot I wouldn't be on my laptop you guys can all take this as an opportunity to make your own account if you'd like oh yeah we can answer a question while I'm yeah so we're we're technically employed by the neuroscience department which is a bit weird I'm actually amateuring neurosciences but that's just kind of a coincidence um but yeah we're we're um funded by uh foundations and private donors but we just happen to be in the neuroscience department so I got hired um for the job because some of the computational stuff that I was doing for my neuroscience degree is actually the same sampling technique as is used to to do that kind of the math sampling thing but our team is actually very diverse we have um go ahead uh so yeah we are pretty diverse we've got um uh James here is a software engineer Hannah's from math we've got um we've heard to Ben before he's a lawyer um yeah a lot of skills okay I got the website up so it's just hard for us to do because like I said it takes 400 man hours to get a state kind of done and we need to have perfect data in order to have a have an impact in like a supreme court case but in order to gerrymander you can see how you could have done that with that new york times map pretty well you just need a basic idea of of how people are voting so they don't need to go and get perfect data and they also have the money to to pay someone to to put in those 400 man hours yeah but also the people who draw the maps are usually in government and if they have more levers uh to work to get the data that they need so actually some data that has just uh some data that you get just by kind of like working your connections and like talking to someone who's in government but it's easier to do that if you are a legislator there's already so much money that is being put into gerrymandering that like I don't think that this is gonna I don't think that this is going to aid people aid the people who would otherwise be gerrymandering I think they already have the tools that they need or have access let's get to questions later at the end oh yeah so let me quickly go through the website now that I have it up so let's pretend that you all want to help out here you just signed up for an account and now you're in open precincts for the first time you can get a little general idea of the different steps or you can go up here to the about and read about the project in general but the the fun thing is to go to the state view and see the state of states right now across the country um and say you wanted to like help out on something like california you'd click on it to get a more detailed view and for most of those states all of those gray ones it's going to tell you something like this where we want you to get in contact with us before you to initiate the process because in states like california you aren't going to have to go to every single county some counties already have it available online and some counties have already compiled it for region so um in every state there's going to be a little bit of a different uh path to action so to start a brand new state you contact us and we will figure out what that path is um but if you click on a state like colorado that already is in progress you'll see the steps that it's currently uh working on right now so right now we're collecting county officials for colorado uh let's say this is the state you wanted to help out with you've already um contacted whoever's in charge of colorado click on on the county you you live in or you're interested in or you have connections to and now you can add officials um to our database directly uh you can work with us to contact them and and try to get back those raw files and then the log will come up right here and when you start getting the raw files in you can upload them right here um directly to our database and sometimes they're available on the county website um which means you could download it and upload it right here for us um so that is a state in that process in that stage of the process let's see but once this is done for a whole state it's going to look like pennsylvania where we're currently cleaning and that's where my diagram sort of got really complicated and the linearity just breaks down so what i'm going to do is show you a clip of the general process where did the slide go is that oh no it's not we were too ambitious sorry okay sorry about that okay this is what we're looking for so this is uh what you would generally do if you need to what we call digitize a map if you get an image of a map um that that isn't already a shape file go ahead and and press play so it's gonna take take your map as input and it's going to ask you what color are the boundaries that's what it's doing right here and then you're gonna you're gonna tell it the color and then it's going to take every area that's surrounded by that color and fill it in with a distinct color um that's what it's doing right now and then it's going to take every every region that's a distinct color triangulate it because we know that for math that polygons can always be triangulated and we know how to randomly uh sample from a triangle so we randomly sample pixel color we take the census block file with underneath which is the smallest unit of census geography that exists we assign every census block to a precinct based on the pixel value above it um and then you can see that you're able to make a precinct file directly from the the image above so this wasn't a shape file before this is a brand new shape file and before this existed what people would do is open up a GIS program and trace the boundaries and that's why it took us 400 man hours because we we had to not only do you have to trace the boundaries you first have to georeference that image um which means giving that image coordinates on a map so there's a lot of manual work that this is um getting rid of so uh once we've done this for every single yeah once we've once we've done that for every single um county in the whole entire state it means we have a shape file for every county in the state but we don't actually have names on those uh those shapes we have to go to the election results pull out election names match them correctly because you really can't have precincts mismatched or you are incorrectly representing those people um so you'd think that a dataset that you got from the local government might match with a shape file that you got from the same county but really there's typos uh or just name changes uh it's a really big mess so that's another another thing that takes a long time but once you've done all of that um you're gonna end up with a state in the final stage and we've only successfully gotten there once so far but that's Virginia we're really proud of Virginia um so at the end you don't need to focus on all of that behind the scenes stuff you can really just focus on the data which was which is what we cared about from the beginning um so we'll have we want everyone to be able to download this with this data we want it to be ported into all the other softwares um but this is just a fun little way for you to get a feel for your state you can turn counties on if you want to orient yourself a little bit and then you can zoom in and and see what's pretty good granularity um how people vote uh and you can move your mouse over different precincts and see exactly what per percentages uh the election went for that area and and yeah this is pretty pretty powerful um and and very accurate compared to stuff that already exists out there and eventually when we have multiple election years you'll be able to select which election years you want to look at um all right so that's the final product and like I said that's what we want every single state to look like and we need help getting there it's going to be a really long process so um if you feel like this project would be something that you'd like to contribute to or if you want to spread the word if you think you know people who would would be all about something like this that's exactly the stage we're in right now uh we we need everything from folks to gather that raw data um to those with more technical knowledge who might work with us to coordinate volunteers for a state or even just a group of counties for a state um we also we need a really diverse skill set here where we need people who have experience in GIS or also user experience and interface or even just requesting government information I heard the word FOIA being thrown around earlier we need people who know how to do FOIAs because you have to do that a lot here um and we also need really really technical help as we make changes to the web app and that's why we brought James up with us he's our open precinct developer um so if you want to talk to us more specifically about that or ask questions about that he's he's gonna be your guy there um so yeah that I think I'll end there because we went through a lot today I hope that everyone knows more about redistricting and the role that data has there uh than you did an hour ago and if you don't if you already knew all of this then you really need to stop and talk stay after and talk with us um so with that yeah I'll open it up to some general questions and then Will and James and I were happy to stay here for as long as we need to have more specific discussions and you can see our our specialties here um but yeah thank you so much for listening uh and coming and we'd love to hear any comments you have about this are the uh are the models that you you were showing of the maps taken into account demographic changes what models were you referring to like the maps that that are being suggested like the the samplers yes uh yeah so that's a good question there are um there's not a lot of like federal law as it relates to how districts need to be but uh there is in terms of the racial composition of districts so um the voting rights act and a lot of the cases that have flowed from the voting rights act um are very important for considering when drawing districts and it's very hard to do because usually whether um whether a map satisfies the voting rights act is a pretty long drawn out court process and has various things that I think the kinds of people at this conference would think are pretty subjective um so uh but there are ways that these samplers deal with that so one is just to say um is just to freeze certain districts so say this is a district that was required to give a minority community the opportunity to elect a candidate of their choice and then just say we're not going to mess with that we're just going to freeze it it's already been approved by a court let's just not touch that one so that's that's one method for these samplers you're so you're talking about Ken if you have a map how how easy is it to forecast how yeah so that's kind of it usually when you when you're trying to figure out how a map is going to vote you just use some past election and like all the forecasting and the predicting is about the map and you just hold the elections constant election prediction is just a different ball game entirely so the question is do we use census data yeah the the whole reason that our digitization process is is going from a random map down to census blocks is because census blocks are the only way that you are going to know um the population of these precincts and when you're when you're drawing if we're telling people that they should use this data to draw their own map you need to be able to apply population to it because these maps need to be within like really close in population so that's that's how we use it there and also in in some of the tools like like district or um you can you can see a breakdown of how people vote in every district but they also show a breakdown of the demographics in each district which which becomes useful yeah it really is and that's why we need something that's going to be able to maintain always collecting the the most accurate precincts because it really depends where you are there are some rural counties in virginia where i'm sure the precincts are the same as they were a really long time ago but in LA the precincts change every single election so there's there's not even a map that that will be the same if there were two elections in the same in in a few years of each other um but generally what we see is a map like uh the state of Ohio will have 10 of the precincts across the the state changing from election to election so so yeah you have to this is one of the open questions that we're trying to think about how are we going to handle this are we going to uh make sure we have different data sets for every year we're going to try to make sure that our most up to date data set somehow encompasses all of the others very very interesting uh what you were just saying kind of confirms my my my question in italy we don't have a general mandarin but every election basically the majority or the majority the leadership just redraws the the electoral law and basically achieves the same principle the same the same objective is to maintain their power at the next cycle even though they're losing it right um so what are what do you think is the end game for this for this project is this um like reducing gerrymandering freezing having more objective uh laws rules inside the laws yeah so um we're pretty involved in redistricting reform so we're pretty involved in um doing what we can to change laws um around the country to implement independent commissions like california so i mean that's really like the main there's that uh and there's lawsuits um but right now it's time to think about 2021's redistricting so that's kind of like right on the horizon so we're kind of focusing towards the tools that um can be used there uh which i mean that's what that's what we're talking about today but i mean really in the in the long run it's you know just getting just getting able to um get citizens to be involved in the the redistricting process is really great um but you really do want to change the laws and there's definitely there's been some really good like this is an issue that's on americans radars now this is on the ballot in four states in 2018 and past in all of them so hopefully things are changing but um yeah uh yeah i mean we yeah we want maps to be fair and and we want the process behind the maps uh map making to be fair yeah so the l.a county clerk is going to be talking on saturday i've heard a preview for anybody who's interested at three o'clock they're going to be talking about their district in plan uh their district in plan for the next election cycle is to have no district so free things sorry thank you so you'll be able to vote in l.a county anywhere you won't have a designated voting place you can vote anywhere and it'll be over the course of 11 days so they're going to be talking about the open source software that they're going to be they're releasing that software as part of this conference i mean so why don't you come to their talk on saturday now it's actually where i was going with this is how do you see that affecting your work as as their things progress yes so we're really interested to talk with them too and and see the the thought process behind that um but i i sort of see two two possibilities here one is if we stop um reporting the the data at such a granular level that means that the people who are in power also don't have access to it so um in that case it might be something that uh no longer has to be a project if all of l.a county is just reported like at the county level you're going to have to make districting decisions in other ways um but another thing that i've heard a lot of people talk about is uh figuring out a way to to make sure that our our votes the ballot is like linked to your precinct and i think that this is what's happening in places like oregon that are moving to completely um mail-in ballots you have some counties that have the ability to add the technology and vicki might know a little bit more about this no the the the precinct i don't think i mean the problem with with i mean it's like you could um uh the question is just like where do you want to anonymize the data right like if we if we have um you know i'm curious to see this talk because if the data is uh collected at a really granular level it could be reported at a really granular level uh but then you run into anonymization issues and that that's the same for if you're reporting if you were to start reporting results by census block lots of census blocks have one person in them i was curious about the technology that was used for the mapping to store the data and then display it i was wondering if you could say anything about that on the on the open precinct's website um so behind the scenes we're going to be using post gis um right now most of the data is just stored in shapefiles um this work has been done pretty incredibly by the team so far in google spreadsheets um so this is like the next generation tool um so we're getting things in the post gis and the map um will actually wrote the map is going to be using map box um pretty pretty standard open source stack for all that so you speak pretty glowingly about independent commissions or states that happen how well does that correlate with improved outcomes on the gerrymander incident uh very well in general so it's um well i shouldn't i shouldn't jump straight to that it's there hasn't been a lot of data so um arizona had an independent commission in 2000 um and uh california had one and then arizona and california had it in the 2010 cycle um and that's it so we have basically three instances of independent commissions at the congressional level anyway um and we should have some more data soon because michigan and colorado will be joining the ranks for the 2021 redistricting but in general people are pretty um people are pretty uh people who study this think that california california's commission did a really good job of changing the process which was um i mean independent commissions are a messy process but what was happening before in california i there are probably people in the room who know more than i do but there's a lot of um that i think that the the district map before the that was used in the aughts was generally considered a bipartisan gerrymander not a very at the congressional map anyway not very competitive um it was a and i think that the uh the california commission definitely people think that it made elections much more competitive even though the law i think does not say anything about competitiveness it doesn't value competitiveness at all um and they can't look at partisan data actually but they still got a pretty nice competitive map where if you were to plot the um the districts like i did on that dot plot they are kind of like more or less linearly arranged doesn't seem like there's any evidence of of any shenanigans not to be completely paranoid but do you anticipate bad actors perhaps um employees of the consulting firms who are doing this currently for uh the people in power uh creating accounts and trying to flood out your good data with garbage data um not necessarily that particularly but um you know we are consuming i mean there's a crowd sourcing element to this and so there's a there's a bit of protection that goes into it um there's kind of one of the reasons that we were switching to the model we're switching to to open this up nationally is that we need more granular permissions and so we'll be able to much more closely track what individuals are doing and if someone someone is a bad actor or people aren't always bad actors i found in crowdsourcing projects but often they're kind of um you know reckless um and don't really realize that they're making lots of errors and so we'll have a lot more of an ability to like roll back changes on the user by user basis or not grant them permission in the first place but it's um it's definitely something to worry about because um something that independent commissions face a lot when they you know they have independent commissions often um the bills that create them also create like massive opportunities for public comment and uh i think in every instance of an independent commission there's been uh people who say who they're not um people who are part of our community but it's it has a ill intent for sure and rooting those out is difficult for these commissions as a person who lives in one of those red states north carolina that's been very gerrymandered uh i just want to thank you for this work so appreciate it you can tell you're very passionate about it um and thank you thanks are there metrics that can tell you that um precinct is drawn to represent the district on the metrics that can tell you that precinct is drawn to i don't know specifically there are metrics for that but there is a concept called communities of interest um and one of the things that we're also looking at is a tool to let people kind of draw their community of interest which is that can really be a very generally defined term and i think figuring out exactly how those communities are just split or not split might be in that direction did you mean precincts or districts okay yeah seems answer applies it's a fascinating thing does people register to vote and that's the only way you know who they are and i noticed i noticed that estonia at psychon last year they were talking about this a lot that estonia has established a national id for people and it's really great because you have a you have an id and you can vote as many times as you want only the last time counts but have have you run into uh what kind of issues have you run into is identifying the actual people that are registered in actual address is that actually collected so we do we do a little bit of work um with what's called voter role or voter record which is a record for the an entire state that has every registered voter um some other voter information and then what precincts they're designated to um so we use this not at all we we don't actually want to know anything about the individual because all we care about is the individuals within a specific precinct so what we figured out to do with this data it's it's definitely not good enough to just um make a map out of it but if you if you can take every person who's registered as a voter assign them to their precinct and then put that on a map you can sort of build a decently good map and we've been using that to just verify that our our naming is correct and that uh no no boundaries have been split no precincts have been split or merged because when I was talking about precincts changing 90 percent of the time it's two precincts being added together because the lines were too short at the polling place or one precinct being split in half because the lines were too long so um I I've I've lost my train of thought now uh can you did I answer your question am I missing something I I lost my train of thought yes you did but you also brought up another question is is as you know all the news today quite with reasonable reason to be thinking about this all the number of people that live in a community or a precinct that don't vote because they don't know to vote or whatever reason doesn't matter doesn't matter how they go it's just they don't get counted and how do you approach that I'm remembering um what my train of thought was first let me finish that real quick uh yeah the only way that we use voter registration information the only way that we use individual information about an individual is just to make sure that we basically have the naming correct um yeah I mean if people who who if people don't vote then um they're that that affects the data but they're probably not going to vote in future elections either so there are like other projects um with the with the main goal of of people who are eligible to vote who don't vote because they're not on the voting roles because for whatever reason they were told they can't vote whatever reason they didn't know to vote you know but there's always like a household they have four adults in a household four people that could vote maybe only one of them vote so that one person would be on a registration somewhere a voting registration the other three might not so how is there any any effort to try to find out the people that are eligible to vote that didn't get on the voting so the thing with the the thing about that that makes it difficult is uh we have an anonymous vote so you don't actually know you know how many uh eligible voters you know how many registered voters in a precinct actually went and and cast their vote and that's called the the turnout but it's it's sort of a whole there are so many different uh ways that you can attack the problem of like uh unfair democracy and that's just like a whole another project of making sure that people are educated um and and knowing where to vote making sure that they know that they are accurately um registered and if someone tells them that they aren't they need to fight it a little bit um but i think that getting people invested in redistricting uh getting them to talk to to their community making sure that everyone they know who's eligible is registered and then maybe talking about uh the election after voting and and hearing if someone was told they they weren't eligible uh and making sure that they go and fight that a little bit i think just community involvement and public education in general which this data does promote because everyone wants to look at the map and try to find their house and see what's going on around there it's just gonna uh bolster people's uh involvement in democracy in general thank you very much let's give them another round of applause and thank you all for coming out to the open government track we've got a bunch of other open data and some government related talks going on saturday and sunday as part of the open data track so definitely take a look at that oh and then uh also if you're interested in these like civic technology type projects hack for allay is the local code for america brigade we're a volunteer group that works on these kinds of things in our spare time as well uh we have a booth in the expo hall as part of the open source dot com table so come check us out there uh and yeah thank you all for coming