 Thank you, Nate. Can everyone hear me all right in the back? Barely. Okay. Let's let's test the microphone. How's that preferable? Okay. All right, we'll stick with the microphone then. Thank you all for being here. This is a talk I've been wanting to give for a while. I've compressed it down quite a bit, so I hope I hit the highlights that are interesting to you. So a little bit about me. I work at remix. Remix. I'll explain a little bit later. And we do a couple of different things. What I'm going to be focusing on today is, of course, GTFS and the transit data standard that we use. So I have a quick question for you all. Actually, a couple of questions, which is, how'd you get here today? Okay. Who rode transit here? Nice. All right. Shout out to TriMet. Let's see. Who rode a bike here today? Awesome. Nice. Shout out for Portland's bike infrastructure. And then who walked? Amazing. That's a pretty good breakdown. What's that? Car sharing. Awesome. Yeah. So this is part of why I'm here today, right? So if you leave here today and you are excited to go and dig into the data of your local transit agency, I would be absolutely thrilled. That would mean a lot to me. So that is a big part of why I'm here today, is to get you excited about this, because it's a really wonderful community and there's just a ton of data and it's easily accessible and also most of it is CSVs. So talk about that. A quick quote. So you've probably seen this before. I couldn't find an attribution, but the reason I brought it up is because this statement is true of almost any urban infrastructure project, right? Like, start it as soon as you can. But it's also very true of data standards. So if there is not a data standard, it would have been great if we had one 20 years ago, but let's start right now if we don't. So I thought this would be an interesting way to do sort of a mid-20 year check-in, because this month is the 10-year anniversary of GTFS, May 18th, actually. And so, yeah, data standards, right? They're amazing. So and it was born in Portland. GTFS was created here. It was a collaboration between TriMet and Google. Many of you may know that. The Bibiana McHugh was then an IT manager at TriMet and from what I understand basically pitched this idea to Google and that kind of led into Google being the trip planner that it is today. So I dug around in the archives of the GTFS mailing list and I found sort of the introduction to GTFS by Joe Hughes from Google Transit. And so there's just really two really interesting paragraphs, this slide and the next. And it's the reasoning behind why they chose CSV. I thought this was very relevant. So I just wanted to touch on these two things to give you a better picture of the overlap here. So obviously spreadsheet programs and text editors, this is why CSVs are important. And then the next piece, which is sort of more about it's easy to create them. CSVs are easy to create. So I just thought this was really interesting that the origins of GTFS are very much tied to this data format, very relevant to this conference. Let's see. So data is one of the greatest assets for cities. I think this is fairly obvious. And GTFS was born in part because Bibiana McHugh and the people at TriMet recognized this. Data is also one of the greatest challenges of cities. And this is also an important thing that was recognized around the origins of GTFS. And it might not always be the best for government to sort of deliver those services, which is why the partnership was there. They delivered the data and the service was handled elsewhere. Quickly about remix. So this is my company. It's part of why I'm here today. We're a team of designers, urbanists, engineers, and we build software for transportation planners to plan out primarily bus systems. So just a quick video. This is what our software looks like. One of the things is just quickly sketching out a route. You'll see sort of the cost being estimated, some demographics statistics, and the cost is all based on the actual schedule. And it lets you quickly build out what a transit system could look like and work against the actual cost of doing that. So work within your budget. This is sort of the major constraint, of course. And these are some of the problems that we're trying to solve. And this is just sort of the tip of the iceberg of what we've built. But it's part of what I'll be focusing on today. So remix is very much about planning for the city as it will be, not as it is today, not as it was in the past. We're very forward looking. And cities are constantly changing. So that means we have some interesting challenges. So the data that our users care about might be a little bit different than the data you see on open data portals or published by the census. One example of that is some of the streets that the agencies that work with us are planning against don't exist yet. This is like future infrastructure that they need to plan transit against. So there's some friction in there between what exists in OpenStreetMap, which is sort of the now, and what they actually need to be able to route against. And that routing action, that kind of drawing along the street segments, we rely on OpenStreetMap to have those nice interactions. So you also need to know what is the population going to be in the future? Where do you expect the population growth to be? Where do you expect job centers to be? And how do you expect that growth to happen? So this is important data. So here's an example of this is 2025 population estimates in Seattle. So this is based on their very deep knowledge of the Seattle area and how they expect it to grow in the next 10 or 15 years. So this is the data that they care about planning against when they're looking for long-term, over long-term horizons. So how can we best deliver transit to populations that haven't yet materialized? This is a very hard problem and it's something our users care a lot about. So we get a lot of basically population estimates and job estimates, lots of geospatial data. So another thing that we get is what I'd call pre-release GTFS. This is not the GTFS that agencies would give to Google or that they would publish online or you would find on transit feeds or transit land. This is GTFS that they've sketched out or a consultant has put together with them and represents a scenario or scenarios that they are evaluating to potentially deliver in the future. And that's one of the ways. GTFS has made transit systems, the data about transit systems, very portable. And so we get this pre-release GTFS data. It's often of lower quality, sometimes incomplete, but we've gotten very good at munging GTFS and bringing it apart and putting it back together and getting to a place where it's useful for our users. So I also wanted to talk about, this is, maybe makes sense, I don't know, made sense at the time I put the slide together. But the idea is the way we build products is sort of we take data, we care a lot about design. So design is like this initial process and you really can't do the design without data, which is actually what there was a talk yesterday about that and it's very, very true. And then there's a set of algorithms that enable you to build rich UI's, rich interactions over complex data to accomplish things much more quickly. You can go and twiddle with data and accomplish the same goals. You can't do it quickly and it's prone to error. So investing in algorithms is a big part of what we do and that's what I'm going to be kind of talking about for the rest of the talk. Some of the algorithms problems and how we've solved them, like that. And then the end is some product that is useful and valuable. So the first thing I want to talk about is nearby stops. If you saw in that earlier, that quick video, as that route was being drawn, we were sort of connecting it to the nearby stops. Those stops presumably are close to or nearby or are the actual stops that a bus route going down that corridor would be stopping at. So what we could have done is just let the planner draw down that corridor and then go in and hover and select those specific stops. Basically do data entry. Data entry is very time consuming and there's a lot of potential for just entering data everywhere in transit. So this is actually an algorithms problem that we decided to solve. So I'll show you a little bit of the background behind that. I'll let this video play and then I'll explain it. It's actually a GIF. So there's that drawing action. Okay, so what you're seeing here is behind a feature flag, it's sort of a development tool that we've built internally and it visualizes what is happening behind the scenes, how we're selecting those stops. So this is Market Street in San Francisco. There are many, many stops. Some are relevant to bus, some are relevant to rail and you need to be able to understand which are nearby so that we can basically remove the majority of the data entry. We can take that off of the plate of the transit planner and let them get to a scenario that they can evaluate faster, which means they can evaluate more scenarios, which means they can get to better transit faster. This is very important. And so basically what you're seeing is the yellow is sort of a broad pass, find nearby stops. The red on the right hand side of the shape is a narrower pass and we're taking into account literally the country that we are in and the side of the road that they drive on and the side of the road that people enter the bus from. And then you see these circles. What we're doing is basically running clustering algorithms because if we just did proximity-based, you would pick whole clusters of stops. So really what you want to do is you want to take one out of a cluster and they're usually clustered around intersections. So this is like layer upon layer of kind of complexity to get to an end product of saving time and getting to a scenario that makes sense much, much faster. So that's the first example of sort of supporting product with data and algorithms and also design. So this whole debug panel is just tons of tweaking. And we're worried actually that we were going to start to need to expose some of these parameters to users, that it wouldn't be good enough. But we never had to. We were able to sort of tweak it enough and to get it ballpark right most of the time. So the next spinning ball. Okay, odometer readings. So this is a really interesting transit data specific problem. So you saw sort of that line string and then a set of points that represent stops. So what we need to do is take that ordered set of stops and that line string and understand we're a bus to follow that path and make those stops. What would its odometer read at each stop? And that gives us a distance between stops. And if we know the time at which those stops are made, we can calculate the speed. And that means we can alter the time between stops or we can alter the speed between stops. We now have two levers to change the same thing. And now you have a much richer way to interact with your system. So this is actually kind of hard. So I'd call this a lasso route where it doubles back over its own path. Does anyone kind of see what would be hard about finding the odometer reading in this case? Yeah, it's basically this cross section right here. We have two stops that are nearby. And this would actually be okay if we had a guaranteed ordering of the stops. If we had a guaranteed ordering of the stops, then we could solve this particular problem pretty easily. We don't always have that in this drawing UI. We would if we got the data directly from the agency, but from drawing new routes we actually have like an unordered set of points and a line string and we have to back into those odometer readings. So again, the country in which we're drawing this route comes into play. Because I can solve this problem now if I know that the bus stops are probably going to be on the right hand side of the road, then this becomes a reasonable problem to solve. Otherwise it does not. So this is like another thing that you come up. There are actually some really good open source examples of algorithms that address this problem. And I'll put together a doc. I didn't do that before, but I'll tweet it out. So this odometer reading problem is quite an interesting one, something we've solved a lot. Okay, so this next one is one of my favorite problems. So I'm going to spend a little bit of extra time on this. So this is timetable headers. For the amount of transit data that is out in the world, the number of times it has been put into timetable format and visualize is very, very low. Timetables are a fundamental visualization of fixed route transit. Everyone knows how to read it. Almost everyone knows how to read a timetable. It's just a very easy way to look into the specifics of what time a route runs, how often it runs. So the question is why don't people just take GTFS and turn it into timetables all the time so that they can, it's an amazing debugging tool. If you take GTFS and you put it into the timetable, you can find all sorts of problems right off the bat. So why don't people do this? And I'll start with why this is hard. And I'll use Caltrain as an example. So Caltrain isn't actually a particularly difficult case, but there are some interesting things. Caltrain, what Caltrain does really well is demonstrate that each trip, sorry, trips are on the vertical axis. So 102, 104, 206 and down, those columns are trips. Then you can see the stops on the left-hand side. So what you'll notice is these trips are making different stops and that leaves gaps in the timetable. So the ordering of the stops here basically has to be a super sequence of the stops on any one of the trips. And you don't just want a naive super sequence, you want the shortest common super sequence of all of these sets. And that is a particularly difficult problem, which is why you do not see timetables made from GTFS very often. So let's talk about this particular algorithm in the abstract. So I've shortened shortest common super sequence to SCS and I've passed it a list of two arguments. Let me explain the arguments because just assume that each character in the words timetable and headers, just assume that each one of those is a stop identifier. So T is a stop identifier and you can see that both of these sets include E and they both include A. So those are the same stop. They both make some of the same stops. So what we want is the output of this should be one string for which each input is a sub sequence, like an ordered sub sequence. So I'll just show that. So this is the shortest, actually, I don't actually know if this is the shortest possible super sequence. I'll get to that later. So the letters in red, if we deleted them, we would get back to the original word in the order. So that means that each is a sub sequence of that answer. So you have to solve this problem for k patterns, however many specific patterns you have, in order to get to the timetable header. And once you have the timetable header, the layout of the trips is easy. But solving this problem is particularly hard and I'll just switch this into a table format. So you can see what it would look like. Now we have laid out a timetable. There are not actually times in it, but for visualization you can see kind of the problem laid out. And this table happens to be transposed from the Caltrain timetable itself, but I hope you get the point. So this is an NP complete problem. It cannot be, it's not a tractable problem by, you cannot solve it optimally in all cases. And so there's this big barrier from going from transit data to one of the fundamental visualizations of fixed route transit, which means that the solution is typically that someone went into software somewhere and said, first I want to show this stop in the timetable header, then I want to show this stop, and then this one. They did it manually. So any timetable you see, it's almost always done manually, which is, again, data entry, really annoying and always problematic. So this is one of the, so we basically, you know, had to solve this problem not necessarily, you know, globally optimally, but in some way to get this to work. And so, and of course there are degenerate cases that would make the naive approach to this just, just not work. So okay, so that is particular, I love timetables. Come, we can talk about timetables later. But this is why you just don't see as many timetables as you do, because it's basically a manual process right now. So okay, but all of these algorithms are only really part of the, part of the problem here. So, but I want to actually talk about the secret sauce of a GTFS data pipeline. Does anyone know what the secret sauce of a GTFS data pipeline is? It is a technical term. Any guesses? CSV is a really good guess. That's my number two. No, this is, what's that? No, thank you, but no. No, this is a little bit cheesy, but it literally is people. So I wish this were a joke. Okay, I wish this were a joke. It is not. I wish the data were always perfectly valid and perfectly represented the system that it was intended to represent. I wish that agencies always knew where they hid that darn GTFS file. That would make everyone's life much, much easier. I wish that every agency knew what GTFS was and how valuable it would be to them if they could just create it. These things are not always the case, which is why we work very closely with agencies. We develop in our entire team a very deep competency around data. So anyone at our company can debug a GTFS file to some extent, anyone. Anyone at our company can take a shape file, open it up in QGIS, and figure out roughly what's going on in this shape file. So data provenance, where GTFS came from, is often a mystery, much more often than you would expect. The software that produced it, if it was software, is not always known. And oftentimes for smaller agencies, there are sort of regional cooperative groups that are creating the GTFS on their behalf. And that means they may not be the steward of their own GTFS data. So we work very closely with them to sort of back into where their data came from and how we can get an up-to-date copy of it. So people really are the important part of a GTFS data pipeline because the rest of it can be automated. But at the end of the day, you're going to need to pick up the phone or send someone an email and really get to the bottom of data. And I'm sure all of you can kind of understand what that is like, but it is an important part of what we do. Okay, before my battery runs out here. So now a request. Please go explore transit data. Make art from transit data. There are amazing, fun things you can do with transit data, not just putting it on a map. I think that's start there. Go try to make timetables. Go grab some data and try to make timetables out of it. And go find the data for your city. So here's some quick art that I made. This is if the San Francisco cable cars were a star formation, I'll say. So these are this and more interesting things, even timetable-y things you can find on my GitHub. And I want to do one last plug here, which is if this sounds interesting to you, if you want to work with transit agencies, if you want to work with city government, we are hiring, you can go to remix.com slash jobs or just email me, come up and talk to me afterwards. I'll tell you about all the amazing things that we're working on. Thank you all for being here.