 Our opening speaker this fall semesters, Dr. Jamie Saxon postdoctoral fellow at the Center for Data and Computing at the Department of Computer Science and the Center for Spatial Data Science in University of Chicago. My name is Ranjani Srinivasan. I'm a PhD student here at Columbia's Urban Planning Program, and I'll be moderating the session. I'll start with a few brief technical logistical announcements and then turn to introducing our speaker. During the talk, I'd like to remind the audience members on zoom to please mute their microphones, we will be recording today's lecture so anyone in the audience who wishes to not be recorded should turn off their video input. Audience in every 114 who are also connected on zoom please be mindful to mute your sound as well. The chat box should only be used for discussion regarding the session. If you have any technical questions that apply only to you. Please message my co-host Helena wrong or Carolyn Swope privately. We encourage all of you to type your questions into the chat box during the presentation after the presentation we will have time for q&a. We will start q&a at around two o'clock or 215pm so that we will have enough time for everyone's questions. I'll be coordinating the q&a with attention to diversity and inclusion so if you've already had a chance to ask a question please allow others to do so before asking another one. To ask questions participants can use the raise the hand your hand feature, and we'll call on you to unmute and ask your question directly. You may also type your questions into the chat box and I can read them out. And for audience in every 114 you can raise your hand and I'll call on you so you can ask your question directly. With that I'm delighted to introduce today's speaker. Dr. Saxon is a postdoctoral fellow at the Center for data and computing at the University of Chicago, where he develops data pipelines to evaluate the availability and accessibility of resources in neighborhoods of American cities. Formerly he was a postdoctoral fellow with the Harris School of Public Policy and the Center for spatial data science at Chicago. He was originally trained as an experimental physicist. He has expertise in instrumentation sensors data acquisition systems and big data methods. He was also closely involved in the discovery of the Higgs Boston. And Dr. Saxon stopped today's entitled structures of local mobility in Chicago which draws from his work, observing the mobility of Chicago residents through a large data set of smartphone users and constructing a neighborhood level mobility network for the city to characterize neighborhoods according to their local graphs structure. So Dr. Saxon if you're ready now I'll pass things over to you. Can you hear me. All right. Yes. Great. So. And can you see the, can you see the street again. Awesome. Okay, so thank you so much for joining and all of the organizers for making this happen. And to everyone for for joining today. So this is a project as we're going to set on the structures of local mobility in Chicago. And this project really started on this street, which is 61st Street. It defines the line between the University of Chicago on the north side and the neighborhood of Woodlawn to the south. And when I moved to the University of Chicago in the first instance, people who had who had lived here in this neighborhood and Hyde Park in years and decades past told me that this was the line that you could not cross right this is this line things are unsafe, and I was about to move in about half a block from here, I was told that I had to find a new apartment because that was going to I was going to get myself killed. And so naturally I was fascinated by this line. And it did happen that several times actually I would go for a walk along this line and through and through Woodlawn, and then get back to my desk and find an alert from campus security saying that there have been shots fired along the street that were taken. But, but this line just captured my interest, because it, it felt like it contradicted some of the shortcuts that we take in planning or sociology or economics, namely the things that are close that are proximate are accessible and interact with each other. You see that in models of walkability to parched park catchments. You see it in econ with the ideas of learning and cities you see it in sociology with ideas of social cohesion. And you see it in geology geography with you know Tobler's first law that near things are more related than distant things. And here's a space for two neighborhoods come up against each other for a mile. When I perceived I saw that these interactions across this line were limited, but more than that, the amount of pedestrian activity on the north side of the street. I mean this neighborhood was different from the amount of pedestrian activity on this side of the street in Woodlawn. And what I wanted to do was find a way to observe this at a larger scale, not just on this street, but throughout Chicago and ultimately throughout the United States. So I'm not the first one to have examined this street actually Jane Jacobs in the in the chapter on boundaries and definitely for great American cities she talks about 61st Street it's a sort of a DMZ between the university and Woodlawn. But more broadly she says that if self government in a place is to work. There must be a continuity of people who have formed neighborhood networks. Those networks are cities that replaceable social capital. So it's really clear for Jane Jacobs that the idea of neighborhood networks bears a spatial imprint right we in our daily routines and the intricate ball as she calls it of city streets, we are drawn across spaces for cross use between our activities between the various functions that a healthy neighborhood allows. There's a physical imprint of that. But she also talks about social capital and although she wasn't the first to do so, she was pretty early. And she was influential and I wanted to pull out those two pieces, the impact of this cross use of this ballet of the streets, the neighborhood ties, and the social social capital. So years after Jacob's work. The idea of social capital was formalized and sort of drawn into the network topology of the graph clustering coefficient by James Coleman in 1988 so sort of the seminal piece on social capital. So the idea here is that focusing on diagrams in the middle on the top be and see are not connected to each other, whereas on the bottom, they are. And the idea is that it be as a jerk to a in the bottom case see can exercise some form of social control. It's not easy for having been a jerk, whereas in the top diagram they can't there's an interlocking set there's a return path for the social interactions. It's going to come back to bite you. But this basic network topology has existed since really the dawn of urban sociology so this is a delinquency triangle drawn from parks and Burgess 1925, these two kids going out on a date somewhere in the city, and the way that social control is exerted upon them and in upon their relationship depends on the contexts where they are. More recently, Chris Browning at OSU has done work that brings me social closure graph question coefficient on to a digraph of home neighborhoods and visited locations and although that we have for here this is a digraph. So the basic network topology the basic sociological idea remains the same. And then he did he brings this explicitly towards criminality that places that share neighborhoods are going to be in some, in some sense, neighborhoods and destinations are going to be healthier than places that do not. Chris had only used simulated mobility data for Columbus Ohio in this study and I wanted to bring real world routines to it. So I searched, I far and wide and failed many times to try to find this form of data. I don't have to talk about that. But I ultimately ended up using cell phone location data, a little bit earlier than it became widely available and and popular to do with coronavirus pandemic. And so I'm going to describe how to build neighborhood networks from these cell phone location data I'll describe the data themselves as well. So I'm going to describe methods for characterizing the local relationships and indeed I really am focused on on the local behaviors using simple graph properties, drawn from network analysis so the social closure here graph cluster and coefficient of these two places, are they connected to each other, and what I'll call the local out degree, which is just sort of the consistency between the empirical relationships with near neighbors, and the set of your neighbors. So what I'm going to talk about is caniars neighbors a good indication of high levels of interaction. And after defining these are constructing these measures, I want to argue that variation in those measures matters, and matters, because it is correlated with the status so it is part of the things set of things that people experienced and privileged and poor environments cities I think it is a component of poverty right, it is a basic expression of what we do with our daily, you know, our body on a daily basis. It is not just a backwards reconstruction of median household income or educational attainment using big data methods that has actually been done before in a certain certain sense. So what I want to argue instead is that it's actually something more than it's not just something to rich people by it offers us new independent explanatory power, when we look towards other social outcomes, and I will focus specifically on crime, because that's what one of changes in the subsypothesis was. This project came out in the environment I'm planning be last year. If there is time and there should be. I will also talk more about my, my ongoing work to try to really get into ground truthing this form of data to open up both new ways of doing these studies and also ground truthing. So what we know about the representative this and potential biases of the data data as a form. Okay, so if you're not familiar with this form of data. Ultimately, it comes down to this, it is three fields. It is a device identifier, a time at which the device was at a place. So location, a device, a device ID and a time. Location is encoded as a latitude and longitude. And you'll see also some ancillary fields like precision and things like that. The app that generated. Usually you don't know what app it is, but you'll have a unique ID for the app as well. More recent data sets that have come out package up these fields in different ways and give you sort of dwell times or precision based on a longer number of things. So you can characterize the behavior or associate it with a coffee shop. But ultimately, all everything that is done with these data is just coming from these three fields device time location. And so, for just the city of Chicago, this is May 2017 one month of data 600 million points I'm showing you about 2% of it here. Immediately you see the structure of the city of Chicago, you see the highways standing out you see the loop which is the core business district blowing along the lake. I will return to this map over and over again so I just want to orient you the suburbs, which are the one of you know one of the two rich areas of Chicago are on your left to the west. The north side, also wealthier up here along the lake. There's some scars running through the city you have the sanitary and ship canal and some industry around it, some transportation hub or depots around here. And we have the airports which are a funny case, because they have 24 hour operations. And then down here along the south side, we have the neighborhood of Hyde Park, where the University of Chicago is, but this is really going to stand out as an outlier over again. So the first thing to do is drop all of the imprecise data. So there's a lot of data that don't actually have a GPS fix. And they these get sort of concentrated in a few locations throughout the city. So I do a lot of work to get rid of those. I do not consider that it's a real interaction with the neighborhood to drive through it on the Dan Ryan expressway. And so I flag and remove any locations that are within 10 meters of a highway or major major road as defined by open street maps. Next, I join all of the locations to census tracks, so that I know what census track those locations are in. And then I will compute the home location of each device as the modal the most frequent device, your most frequent track at night. So now I have basically all of the data cleaned and a set of locations as census tracks across the city, and for each device, a home location. And now I need to construct mobility as a network. And the question that I'm going to be asking is, how much time do individuals spend in the neighborhoods around their places of residence. And so this is averaged across the residents of each location. And it's a the graph extends across the entire region, the entire Chicago region. And if I focus on just the local interaction so I'm just going to take those set of interactions that are local. This graph is related to the loop to the core business district. I'm going to ignore this for a moment just focusing on the local interactions. This graph looks like this. And so what I'm showing you again, same map suburbs on the left hand side cities on the right hand side, higher levels of interaction or darker colors, lighter levels, lower levels of interaction or lighter colors. And so the first thing that you will notice is that there are more darker colors, higher levels of interaction with local neighbors in the suburbs than in the cities. And this is sort of trivially the same observation as scaling laws of cities. Right. We're less willing to cross over a million people were less willing to drive five census tracts away when that entails a longer distance. And so when we have a lower density, people are going to remain within the adjacent census tracts, more than they do in the city. Within the city. We still see pretty enormous variation and specifically the north side has higher level of interaction than the south side. But within the south side, the neighborhood of high park stands out as an outline. Now, the basic take away the basic message of this analysis is actually going to turn out to be correct. But as I have constructed it. It is incomplete, and it is flawed. And the reason is that census tracts have different sizes. So this census tract in south loop, I'm near the poor business district has 18,500 people living and some of the census tracts over here on the south side have fewer than 2000 people because they have suffered depopulation over the past decades. And so the amount of interaction with your nearest neighbor. Well, if you have an 18,000 person population area right. There's a lot more, you know, chances for interaction. And also, the extent to which the interaction coming out of this neighborhood is felt is totally outsized when we have this place it's a factor of 10 larger than some of the tracks down here. And this sort of methodological wrinkle is called split node invariance. Specifically, I do not want the constructed graph quantities to be reliance to the extent that I can avoid it on the structure of the sort of arbitrary subdivisions that the census has laid out I know they're not completely arbitrary. But nobody living in the south loop knows which census tractor and if we divide it into we should still have basically the same constructed parameters. So to do that, I'm going to make a few definitions. First, I'm going to define the time in location L census track L by a user you as a UL, and I will take the average over all of the users living in a home track H as a HL. So, a HL is just the average of a UL for everybody who lives in neighborhood that's the graph we were just looking at. So with the split node invariance, the fact that people don't know which census track they live in, but they do know what their neighbor feels like we need the census population of each track. That's in each. I would call big in each K, the cumulative population of the K tracks that are nearest to that home track H. I would call VH, the vicinity of a home location, the largest set of K tracks, such that NK is less than some threshold, which I will set to 40,000 people. So it's the set of neighboring tracks of constant populations are dealing with the fact that the tracks may be split in different ways. So these definitions, we can now proceed. And the first question is to construct the clustering coefficient with weighted edges, different levels of interaction, weighted nodes different populations. And the question is, do I share destinations with the residents of places that I visit. Another way of saying this is, am I a mobility peer of the people, the residents of the places where I go. You could also I sort of think of it as, do I share ownership with people at my destinations of that space. And then is outgoing interaction to them and then are these connected to each other. We have weighted edges. And then we have the influence the level of connectivity from I to J is diluted at J by the amount of people who live in J. So we have a factor of the census population here and if I add this all together. And all we're considering is the amount of outgoing interactions. And the on the top bottom, bottom and the top. And these are weighted by the amount of actions that are actually seen between those destinations. So that's the question coefficient. The local out degree the question is, do I interact with the neighborhoods in the immediate vicinity of my home so taking this neighbor here. We'll look at summing up all of the outgoing interactions to the local neighbors up to this 40,000 person threshold. So at the trivial level or the naive level I would simply add all of these up to 40,000. We have to make a little bit of a correction for this, because since the tracks don't meet the add up to 40,000. And so there is a little bit left over that I will have to add on so I will take some fraction of the K plus one track. To remove from this some the share of interactions that are taking place in the home location. This self interaction is, you know, you can think about the spatial scale maybe it's good to be sitting on the soup but dwelling inside of your home permanently is less of a good sign is probably more of a sign of low employment and indeed, you know I see that this is negatively correlated with good outcomes rather than positively as this one is. Okay. So these are the two constructions, how related our neighbor notes and how related am I to my immediate physical vicinity. Thomas, the picture remains largely the same so the first question is, do I use the neighborhoods in the immediate vicinity of my home. Here we have the local out degree this is a measure that could theoretically go from zero to one. So the lighter colors are more interactions with the areas around your home as before, right, same basic idea, more interactions with the immediate neighbors in the suburbs than in the city. But variation across the city, higher levels of interaction on the north side south side, and this outlier of high part. You also see the physical form of the city. So we see the sanitary and ship canal and industry along here sort of running as a scar through the city. This is the north branch of the Chicago River so although these are spatially adjacent. We don't get there so if you think of a measure of distance rather than a measure of, you know just immediate proximity sort of queen weights. That's what this is showing here this is the Dan Ryan expressway going down the south side again sort of tearing this under in this in this local space. There's also an outlier within Hyde Park itself is an outlier, but within Hyde Park we also see an outlier of a single cell. This is the University hospitals. These have 24 hour operations and so that shows up as a special case. Right. This is a case where assigning home locations based on where you are at night probably is less successful. That shows up again for a hospital system over on the west side. I've taken out the airports because they are such such outliers they're very very special. So that's the local out degree we turn to the clustering coefficient think of this again is trying to capture the idea of social capital. The question is do the places I go interact with each other and the picture is basically the same. You see more interactions in the suburbs broadly than the city, and more within the city on the north side than the west or south sides, and again this outlier of Hyde Park. So, if you are familiar with the city of Chicago, you will see in this map already just a map of class of the city of Chicago. If you are not familiar with Chicago, that map looks like this. So this is just taking a principal component analysis of some some factors for socioeconomic status income education, single parenting, and a few others. And what you're seeing right is that the north side is wealthier than the west and south sides and then I park as an outlier and the suburbs are rich. So that's the visual impression. If we do this as a scatter plot. We see that indeed, as we would have thought. There's a strong relationship between mobility indicators and neighborhood status so in particular clustering log clustering predicts 80% of the variance and adult educational attainment as measured by the census on this side for the local out degree. It's a little bit weaker but it's around 60.64. Okay, so I want to take just a moment and unpack this before moving, moving on. I came at this work originally thinking about poverty and neighborhood effects so if we go back to Molly or shansky. And in the 1960s right we have just three times food, you should have what you need to be out of poverty. And that has moved to the supplemental poverty measure where we try to have a more realistic set of adjustments for geographic communities living for family size for healthcare expenditures for childcare and so forth. So basically, we're trying to take into account how people actually live. But in the last 10 years, you know, you can think of that as being also related to like Marcus and sort of capabilities approach what are the things that you need to be able to do to be out of poverty to have a flourishing life in an urban neighborhood. And so many of these economists, Patrick Sharkey, Jacob Faber, an economist or sociologist and economist, Jim Hackman and Sponhar-Moso have projects, encouraging people to, to, you know, focus on specific mechanisms, of neighborhoods so context effects, and then in Hackman's case we sort of have a hierarchy of skills, all of the various stacking things that are, that are required for the development of human capital. So there's, I think, a trend towards a focus on each of the individual fissures or facets that make up a flourishing life in the city. If we move a little bit towards the like popular, popular literatures, you can also think about like Tennessee Coates and just thinking about your body in the city and the ways that we are able to physically move confidently through our environment. So that is, to my mind, an incredibly important part of the freedom and the sort of thrill of living in the city. And we see that that varies substantially across the city. I also want to think sort of sociologically about the structure of spatial ties. What we just saw is that more for better prediction of higher levels of interaction with immediate neighbors or neighbors neighbors was a good thing, it was associated with higher status. The higher work by Nathan Eagle, Luca Popolato had found that higher network diversity, that is to say, higher entropy which is to say lower predictability is associated with higher status so there's a sort of parent tension here. And mathematically, that is resolved because I am imposing on this very specific hypothesis about who you need to be related to. You need to be related to the people immediately around your home, and you should be consistent to a certain extent with the people in the places that you visit. So that's mathematically. But we can also think about sort of Granite or Mark Granite or strength of week ties and Jane Jacobs, and the sort of conceptual squaring of that circle is that having a cohesion having a predictability in those local routines is really what allows the week ties to form and to flourish. The third thing I'm just going to think about a little bit or touch on is criminality and network effects and this is a burgeoning literature current a great test up on dissolution information of social ties in the context of violent crime. There's stuff on how the places that you go and how people who come to you affect levels of crime. I'm going to focus just on that network structure, and that is closer to work by Chris Browning, and I'm going to focus in on this one and see the extent to which adding my variables can give this new independent power is this just something that rich people are trying, like park access living on Central Park, or is this something that is another measurable quantity of neighborhoods in a slightly more sort of more trivial sense, the availability of GPS data is also very important for thinking about criminality. Traditionally, sociological and criminological models sort of rely on residential population, when in fact right. The crime that happens in Chicago, you know in the in the downtown district or Manhattan right depends not only on the people who live there, but on the people who come there and GPS data allow us to start to get to this this is similar to work by Martin Anderson and British terminology. So I'm going to focus in on criminality just to show you the value of these constructed variables. So the question is, do they give us new independent explanatory power. And to do this, I'm going to construct a spatial error model for log crime. If you're if you're not familiar with that this is basically less but the there is spatial autocorrelation in the air term. I'm choosing this one versus just a space spatial lag model based on the test. And then I have standard controls so the routine activities theory posits that crime happens when you have a motivated offender and target coinciding in space and time without an effective guardian present. So this view privileges just counts of populations as I said I have a very robust set of controls here. This is residential locations from the sense or residential populations from the census ambient populations from GPS data and work populations from the le hd origin destination origins destination employment statistics. So that is social disorganization theory and here I'm drawing variables very explicitly lifting them with no changes is what I really from Samson and Browning and others, because what I really want to do is just say look here is the existing model in my bringing new information to the table. So do these spatial relationships add to the existing set of covariates. So indeed they do. I know that specification tables are impossible to read, but these numbers are quite significant, and they are negative. So both clustering and the local out degree are conforming to the hypothesis that we get from Jane Jacobs that more cohesive mobility networks are associated with lower levels of criminality, both for violent and property crime. So this is, I'm sorry I should have said this this analysis is only for the city of Chicago where I have crime numbers. So I'm not focused on suburbs which is a sort of different context anyway. And this is five years of data so that I'm able to do these walks. This specification is highly highly robust so we can change the set of controls that are here if I not just followed Chris Browning's lead on this, I would have gotten basically the same answers. We can change the weights strategy here I'm using Queen weights, but I could abuse Kenya's neighbors of any number or the estimation strategy here I am using the ML routine a maximum likelihood routine. In order to get the information criterion right so you see that in fact, the AIC has dropped that is to say the model is better off for having added these two new variables to the model. The AIC has dropped from the models without the ones with these variables. If I had used GMM estimation I wouldn't have the AIC, but the parameter estimates basically the same. This is not a causal analysis. And I think it's just as you know plausible that or likely that you know crime suppresses street activity as the street activity. So there's a great quote from Patrick Sharkey of Robert Snyder in an easy piece. He says, in Washington Heights as elsewhere residents stepped out of their homes, reclaimed parks and sidewalks and overcame the fear that had driven so many people indoors during the years of high crime. And what I like about this is that we see an active reclaiming of space people are pushing back out into the space and claiming it for themselves they're exercising social control as it were. And on the other side we had seen them driven inside by the crime. So we see these these two forces pushing in opposite directions crime driving people away people reclaiming space to to drive out the crime to take it back for themselves. So, as I said this is very robust to different covariates weights methods and so forth. But in this context, I want to focus specifically on the new form of data. But that's that's, you know, these these analyses are becoming more, more frequent, but it's still what is new. And so one of the first ways to think about this is to radically change how we do the pre processing of the underlying GPS location data, how we allow a device to enter into the sample, or how we process the location that we see in the data. So I'm going to do two things. The first thing is to change the deduplication strategy if we see locations that are registered by different apps on a single device close close in time. I will remove those. The second thing that I can do is require devices to show up more often. So that is to say, more nights across the entire month of May 2017, or more frequently so more total pains the average to the median device has 240 pangs, and here I'll just require that the devices have at least 100. And neither of those changes really changes the overall picture, local out degree the parameter estimate gets larger, if anything, but in each case the IC action in that case the IC action improves. So, we're doing better for having this tighter set of requirements on the, the underlying data. Another thing that you could ask is if it's just the data rates somehow that I'm capturing maybe people are scared to have out their phones that's going to suppress in some way. The collection of the locations. So we could look at the median mean ping rate parameters for those are not significant and they do not affect the parameter estimates of interest so that's not, that's not the story. Maybe again, we didn't need to do this analysis at all, I could have just looked to see if there was walkability in a neighborhood and Hyde Park is special, as people live and work at the University of Chicago. And that's not the case either putting in walking or new times from the census. These are not again not significant and do not affect the parameters of interest interest interest. And I'm, you will see obsessed with this question. There's a question of whether the data are representative. And so the one way that I can get into this is look at the number of devices that appear to be resident in a census tract, as opposed to the populations of a census track so I will take histograms of the city of Chicago, and wait them either by the census population or the number of devices and consistent with reports from the Pew Research Center on mobile device penetration, you see that this histograms for education, excuse high for devices so educated areas are going to be a little over represented in terms of devices, a little over represented in terms of household income underrepresented terms of poverty. And consistent with the Pew Research findings. It's not a huge trend for race, and a somewhat larger trend for ethnicity. On the whole, I think this is actually painting up a pretty good picture we're we're covering the neighborhoods of Chicago, pretty reliably and I want to emphasize that what we're really doing. Plus it's sort of stratifying within each with within each track. So we're constructing the variables in track and the assumption that we're relying on is that the people who go out with their smartphone are representative of the mobility of their co, of their neighbors, not that they're representative of people who live in another census track. Still, this reflects device residents, rather than out of home activity, and I want to sort of drive into this because this is what I'm working on now. And so we're stepping into this wonderful world of big data. And the question that I like to think of from Aladdin is to you trust me. And so the big data is sort of inviting us into this wonderful world of the matrix. And I think a little bit more precise than Aladdin. The question is our sample rates consistent across groups and contexts. In other words, do we see the same amount of activity per device in a park, or Starbucks, or on the roadway, or at home. There are different groups of people equally represented represented in proportion to their actual use of those spaces. And I think this is a critical question, not just for this project, but also for all of the work that has relied on this form of data. There has been quite a bit of academic work but also sort of popular press work on levels of interaction and contagion, what we formerly would have thought of as social cohesion in cities in the context of the pandemic. But is a Starbucks and a park, getting the same amount of people per device that's a critical question if you want to think about contagion. It's critical if we're thinking about private investment where we put the next McDonald's when they franchise. If we're getting a different number of people per device, the football the apparent football from GPS data could be very misleading. It's important for public investment. When we think about building parks right GPS data will tend in fact to miss kids and the elder. We don't want to miss them when we're building parks or planning parks or modifying them. And it's important for this burgeoning academic space. So I'm not the first one to think about this. There is work similar to the histograms that I showed you from location data suppliers like Unicast this is household income again they're showing you that it's fairly consistent they actually show you that they are under represented at very high levels of income, this is what I saw, and then this is that was a different data source here on the right hand side, we have work from Amanda cost and people at Carnegie Mellon and Stanford. And they're looking at out of home behaviors here. And so it's by age along the x axis and a race along the y axis, and they're looking at the number of devices for 100, 100 people. And they're looking here by looking at the excess counts of devices on election day North Carolina. And what they see in contrast to the in home levels of activity is that the out of home level that levels of activity do show very strong trends towards more devices per people in younger, wider areas. This is important when we're thinking about what these data represent. So construct this sort of scale factor. The population to device ratio we could look at anything where we have accurate records of the number of people who are there. So think of number of lattes that gets sold at a Starbucks, or the attendance of a blogging. I'm going to focus on traffic. And I'm going to talk about computer vision just a little bit in the last few minutes. So strategy one traffic. I'll compare traffic flows with administrative data, the admin data are the average daily traffic counts or ADTs from the Illinois State's whole highway authority this is coming from mainline toll clauses like this one, and from toll receipts. So these are highly, highly accurate, they are released for each month. So I can associate these ADTs with road segments from open street map. And so I hope people get my visual joke for Joseph Meenard's March to Moscow but the width of the width of these lines represents the daily flow. And each one of these road segments. So this is coming from hadn't. And now I need to construct the same thing from GPS location data, and the way to do this is to connect in points. This is from another data source this is for three months in 2020. Connecting locate travel location behaviors that lead from one location to another, either of those being within 20 kilometers of Chicago or crossing Chicago. So you have to route those to the OSM network taking into account travel speeds and so forth. That is done on AWS using post just some PG rooting there are many excellent alternatives to these all of which can be paralyzed on the grid. And once we've done that we can aggregate up all of those trips you have like 65 million trips. You route those through the street network and you figure out which road segments those show up at. You can compare the road segments that you have admin data for namely the ones on the whole highway. And what we get is a ratio of 1229. So first blush. This is not so different from the number that I quoted earlier I said it was about 300,000 people in the 10 million person area maybe I forgot to say that. So, you know a 3% sample here you got something similar. But in that case, it was people showing up over the course of an entire month, and this is activity by activity. So that's just that's strategy one reconstruct travel flows and compare them to sources of official data you can do this with any form of like red light cameras. And some places in cities have have cameras like this. So any, any ground truth flow you can do this strategy number two is to count inferences using GPS or using video. I'm using looking at a very special park. This is a park along the lakefront of Chicago and it is divided from its neighborhood by a highway and that means that in order for people to get to this park. They have to go through one of two little tunnels or they have to have come biking along the lakefront of course can do that. And by focusing on computer vision on one of these two entrances. I can get a pretty good sense of the number of people who are coming into the space I don't know how long they're dwelling there. How long you're staying, but I know the amount of activity coming in, and I can see the amount of activity that is here. So focusing on this entrance. It looks like this we see people coming through the tunnel. We're applying some, you know, careful but not not extraordinarily complicated computer vision. We can get very, very accurate counts of people coming and going. And so I will compare this for a lifetime periods, just September of last year. I think that this is like, adorable, but unlikely ever to amount so much because getting access to this camera doesn't scale very well. New York City actually makes a lot of these cameras publicly available at lower frame rate. There's a lot that I think you can do with this. I have access to full frame rate cameras in the city of Chicago as well. So the city of Chicago has 30,000 cameras up 2500 of those are near roadways and so I'm sorry to do work. So there are ways to get the municipal feeds you don't have to install these cameras yourself. So, comparing, again, the GPS and computer vision. This looks like this. Here we have the raw location reports or this data supplier clusters, the data. And there's some complication to go into that especially when you think about parks, because you may walk through a park and a cluster may never really end up being in the park. So the ratio is about one to 30 for this specific activity of showing up in the park or one to 40 or cluster data. By way of comparison, to the same period at night the average number of clusters in Chicago on a single night think about as a single activity again. There's 28,000 devices on 2.7 million people in the city that's about one in 100. So these are that's for clustered so sort of think about one in 100 versus one in 40 to activities, how much are those two activities picked up a lot more for the activity that shows up in the park, and at home, just sort of stepping back. At least one month in the city, over the course of a month is about a one in 33 sample. Any specific day or night in the city, everyone has to go somewhere so they've got to be somewhere, right, is about one in 100 clustered data and parks about 140 tall ways we see something more like one in 30 for clustered or for the travel flags behaviors. So what this means is that context by context we may be picking up, you know, different amounts of people. And I think it's important. Moving forward to get get at that I also think that there's immense promise in computer vision for moving beyond GPS data to see how people are really sharing space, because as you can see right in the park we're getting one device per 40 people. So I can't really see how people are in the space together. Since I'm coming up to time I'm actually going to skip this slide but I just want to point out that right. We also want to look at how different groups are represented in the course of their daily activities that's similar to the work of Amanda cost and the voting returns dimension. So we can look at based on the home locations of devices and the composition of people who are coming to each road segment are individual road segments, more or less observed. And to break the surprise there is, there is bias. To summarize, I characterized human mobility in Chicago neighborhoods. Using a network base or built from GPS location data, using clustering coefficient and nearest neighbors interactions. These were closely inspired by existing work from Jane Jacobs, other planners across geographic analysis sociology. And we saw that mobility strongly correlated with class and indeed I would argue it is a component of class consistent with Jacobs theory and work by like Chris Browning. It is also has independent predictive power for crime with the right sign. Moving forward, GPS data have an extraordinary strengths in their scalability. So, and their ability to capture activity between geographies so that means the ability to directly address questions of external validity. But also to look at you know people coming from here, going to there where people related to you can't do that with a traffic count you can't do that with computer vision you don't know where they came from. In my mind, the data are surprisingly representative of the population, although not perfect and continued work is needed to assess these biases and to apply these data in context where we really have a responsibility to get it right in planning. Because the data are fairly sparse right a 130 sample 140 sample 100 complimentary techniques like computer vision can help us to measure and optimize specific resources so I've you know how to talks with parks district in Chicago or public health and mental hygiene in about you know what is the impact that if we put in a work of art to people dwell more. What happens if we change the light right do we see interventions on the physical environment changing how people are using and sharing spaces. And that comes to the last thing, which is that spatial routines usually express some sort of a social routine. And I hope will help us to begin to disentangle those and see how people are really using spaces. So as an example there right, I often go to the park and pass by a bunch of old guys playing chess, I do not play chess. So they are clearly having a different social interaction, then I am having. And yet, I feel somehow closer to my neighborhood for having walked by them on my walk. Thank you for your attention and your time, and I'm eager for your discussion. Thank you very much for Saxon for your talk. And I would like to open up the session for questions. As a reminder to ask questions participants on zoom are encouraged to use the race your hand feature. And I'll call on you to unmute and ask your question directly. Before you may also type your questions the chat box and I can read them out for audience and every 114 you can raise your hand and I'll call out on you. So you can ask a question directly. So, I see I see first the, the question from Calvin Harrison, if I can just start with that one. And the question says technologies, I explained are super powerful for, you know, academic work and so forth, how can you make sure that this is not applied to more nefarious uses like tracking by government profiling and so forth. And that's right that's a great and super important question. And this, I think this sort of comes up. This is for me a little bit in the context of reports that came out from the New York Times like last year they put out in 2014 they had reports on this like look this is going to change city everything's going to be better because we're going to be able to use the curve efficiently. And then in 2020 they said oh my gosh and you know like literally the headlines are green typeface on a black background. The headline is like your smartphone is spying on you what can you do about it. And the New York Times piece to me felt a little bit unfair because they went and sort of did the things that you're not supposed to do with these data. I am contractually not allowed to do a lot of the things that they did by my IRB I am not allowed to do a lot of things that they did and it felt a little bit to me. Like they sort of like walked out on the street and they're like boom somebody got shot the crime is terrible here. And so it felt it felt a little bit unfair now clearly nefarious actors, or the government can get access to these data although they're actually limitation limit the government is it is allowed to use it for. And so what I see is that this industry is very aware of its responsibility in vetting people who who get access to the data, but also in changing the way that they're making the data available to users. And so that means using much much more process data I do have access to raw data. But if you get things from safe graph now where you get things from unicast now they have constructed networks, they have constructed levels of flows from block groups to, you know, coffee shop or something like this that aim to directly address the privacy concerns. I also think that you are seeing, you know, a trend towards better awareness you know thanks to things like the New York Times of privacy issues. And so if you look at Android or iOS, both of these are implementing much more widespread do not track features that you can imagine in the context of GDPR the European privacy laws, you can do do not track you can imagine plugging that directly into your phones that it would apply widely. Now make it much harder to build data sources like this that is where we are moving in terms of the advertiser ID which is unique identifier that identifies people across all of this obviously makes a lot of these analyses more challenging. This is what has happened in the past for other forms of data so it is what happened with Wi Fi and Bluetooth probes both of those specifications switched to randomizing the device identifier when it is not associated with a network to make this type of thing impossible, and iOS and Android are building that in for the ad ID as well. So I think that is where we are moving both at the OS level and also in the at the data level and also at the consumer level. You can see, you know, in my discussions with industry, right, you see that they are actually able to do a lot of things that we are interested in. With less tracking of people over time, we can remove all all all behaviors that are, you know, very close to home and remove that in various ways. And that's where I think this is moving I think the industry is very. I mean, I'm not I think I know that the industry is very concerned about this, because if they do not get it right. It's, it will make it harder for them to do all of their work that I think I believe that there is tremendous potential in it. Maybe that's not a complete this is this is a huge huge space, but maybe that's a first first response to your question right do not give people access to the data that they do not need do not collect the data that we will never need. But you know the challenge that this brings up though is if we are not recording that type of thing. For some things like looking at the data representative this it actually makes it even harder right because we have no way of reconstructing where people are coming from things like that. Yeah. Okay, I think we have another question in the audience. Oh, hi. Can you hear me all the way back here. Well, yeah. Thanks very much for your presentation. I found it very interesting I lived in Hyde Park from 1996 2001 my wife was a grad student at the University of Chicago I have fond memories of all those places, and definitely familiar with the boundaries that you mentioned but I was wondering did you do any research, did you do any, did you do any historical research to the neighborhood because my understanding is that Hyde Park. I deliberately planned community really starting ironically around the time James Jacobs was writing, I was planned by the University and city authorities to limit the flow of people from the south side into that neighborhood that the university specifically wanted to carve out Hyde Park as an enclave and the city's power and a domain to remove commercial activity from 50th Street, where you have like the university of the loop there at the upper right hand corner of the university parks. Yeah. So, yeah, I mean, So, so this is. Yeah, there's a lot of a lot of history here. So the basic story of this line this DMZ is that in 1957. George beetle comes to the university as president in a time when the whole. The whole neighborhood is is really in a sense under under real threat the university is not able to attract faculty anymore. There is real violence, you know that one of one of the contradictions that I couldn't resolve when I was looking at this was literally which rape and hostage situation it was that finally drove the university crazy. George beetle made Julian Levy, sort of responsible for urban planning in Hyde Park in that period Julian Levy was the brother of Edward Levy who later was the president and US Attorney General so very, very strong ties to power. And he shows up in Eisenhower's, you know, in the White House and basically says I want a meeting, boom they have it like three days later and he says we want to raise huge swaths of high parks so you're probably familiar with Hyde Park and be the I am pay and Harry Weas development that sort of decimated 50 just through this you said. But Julian Levy also focused on 61st Street he actually wanted to put in, you know, very, very analogous to Jane Jacobs but he failed, he wanted to put in a crosstown expressway to permanently, you know, conclusively separate Hyde Park from Woodlawn on 61st Street but if you walk through the physical environment in this space I mean it's not to bite the hand that feeds me too much but to bite it a little bit, you know it's bad. So if you if you enter the University of Chicago from the east right you meet with welcome to University of Chicago signs this is what it looks like coming to it from the west this is where I used to live at. So here we are facing south on Kimbarker facing north now the midway blocks flows, but that's not good enough. We have to have fences up and a playing field so there's two sides of the fences. This is again coming down to the opposite side of these fences and so, rather than seeing welcome to the University of Chicago we've got fences and the street does not go through as you can sort of see down here. If we go over one block again street does not go through they do not want people coming through the policy school where I was was just redesigned so this is. This is a. Edward or snow bill or. Yeah, I think right. Building its redesign has pink foot walls basically keep people out so the street is broken again, even though you have the midway just one block later. If you go over by one block again the streets are broken so over and over again. The, you know, the history of Hyde Park is felt in the physical environment that you feel here. You know you see there's a guy standing here in a blue coat he's one of the people that are hired by the university. You talk guys and you know it's a very complicated relationship. But there was an old guy that I would often talk to when he's like oh you know kids come into the neighborhood every night we chase them out and you can think about that about that statement a lot. So this yeah it's no it's no coincidence that Jane Jacobs focused on this line in particular. I would say the University of Chicago in its south campus development plans is very much sticking to its guns if you walk along 55th Street now, you would not recognize it from when you were here in in the late 90s and early 2001 they put in this huge genie game building this commerce that they're trying to encourage along the street, they are not doing that along 61st Street. I don't know if that exactly responds to your question you've come back from the global to the local again but yeah you definitely see sort of the built environment affecting. And interacting these flows in sort of obvious and very in your face ways. This is your question but yes it does I think that's very interesting as a preference towards what are you're collecting today just acknowledging that this is in fact a planned environment is not an organic. Oh now now yeah. And it's also in a way of microcosm for so much of Chicago with large physical environment is what has been changed over a century to limit human interactions. Right, I mean the whole set of. We could talk we could talk for another hour about road cuts in Hyde Park that are weird, but maybe we can go to another question. The audience. Yeah, go ahead. Yeah, it's a little it's a little warbly. Okay. So. Thank you very much for your presentation. I have actually a couple of questions one. The first one is like, given that there are some recent changes in mobility behavior, maybe accelerated from the pandemic like the increasing use of bikes and also maybe work from home. So I was wondering, like, how do you think this changes can change it can alter your results. And my second question is more methodology. So I was also wondering if you have explore or social network analysis tools. Which can offer different angles, like for example, centrality measures to know which neighborhood will be will be more important or more connected to the other ones. Right. Those are both great questions. So I'm just trying to jot them down so don't forget. So the first one is the COVID changes. I know that there's a way to talk about COVID it's not at least a little bit depressing, but right mobility has gone down we are seeing each other less I am not there with you in New York. And I believe that the actual social behaviors in the park, you know, throughout the neighborhood and the parks and in person are incredibly important, as you can see through this and I think we're not having. From a research perspective. Yeah, I mean I looked at, I looked very early on at the pandemic started at how, you know, changes in mobility. In the context of the pandemic. I did not sort of jump on the COVID bandwagon because I was already living through it and couldn't quite stomach it but you know you see and you can find many other people have done this I think I like to blog post about it but I didn't pursue it much just that you know people from rich neighborhoods. Before the pandemic or going out more as soon as the pandemic hits and they need to self isolate in the first month they're doing so much more. And so, you know mobility starts out as a privilege and then isolation turns into the privilege and of course whoever is privileged gets both of those things. As for thinking about, you know how it has changed in COVID I haven't looked at it. More than that, as for centrality and other measures I thought about a lot about these centrality. You know so I did construct them I wanted to focus on things that had a sort of a deeper theoretical grounding to them. I can tell you that centrality sort of picks up the central business district so it picks up mid down Manhattan it picks up the loop and so forth. And that's great. It's not as related to the other social outcomes as the variables that I described. The thing that I have been considering right is, you know, as I mentioned this form of data invites you to bring it to other cities social contacts to deal with the pandemic and so forth. Another paper with people in the psych department looking at park use, which is another constructed thing it's not a graph measure but maybe think of it as a diet, digraph of parks if you want. Taking those variables and bringing them to specifically to New York City. And because the psychologists were less familiar and comfortable with the question coefficient they wanted to focus on the local activity there that the analysis did replicate very, very successfully to New York. But looking at these same measures in other cities, I think is important and if I look at like Philadelphia, the question coefficient again doesn't do nearly so well. It doesn't do nearly so well as the question coefficient so taking some of these variables and looking at them in other contexts I think it's important. Another one that is not exactly getting into a question about other sort of traditional graph measures, but the thing from a geographic analysis perspective right I defined for the local out degree a very specific scale and I define that scale in terms of the human area. Right, I could have chosen it if it's got maybe I should have. Right, because, you know, like if we look at, you know, like literatures on on job referral networks and economics right the instrument that they end up using is shared jobs within census blocks so much much smaller geography than I use a 40,000 person. Okay, so thinking about different scales to construct those measures on and how that depends on which city we're looking at is one of the directions that I want to be, you know, going next, after the data bias questions and after the, how do people share their data? So, I think I have a question and I'm supposed to stay on with students at this point so I'm happy to and I'm sorry, I left the stuff so I'm here. Yeah, so I was just wondering about the aerial unity of chosen to aggregate the mobile data to which is the census track. I was wondering. Do you think that there would be any change if you used another aerial unit like what was your methodological reason to choose that particular one. Right, I mean so this is this is related to literally I just said about what is the correct spatial scale. You know, there's a couple of answers to this I think all of which are cop outs and I think thinking about the spatial scale is actually one of the next things that I really want to do. The reason that I use census tracks is because it aligned with all of the other variables that I was ever going to want to use. And so when I went to construct neighborhood covariates. Those are not always available down to the block group level and so forth and the other sort of cop out answer I mean these are cop outs but but there are real data issues of trying to go much more finer the other one is that the university and with the IRB. I'm not not exactly sure how realistic their concern is. But basically they were concerned, going much finer than census tracks for the overall analysis, and they have just been much much more comfortable with me sticking with high level. Neighborhoods that is not always possible. Right, if you want to think about, are you going to the park, you do need to veto the home, but the park that is 50 meters away maybe in your track that it may be the most relevant and most important park. And so you cannot just do things at the census track level you may need to know really where it is. But what I basically try to do is construct all the variables that I need and then get away from the raw data as quickly as I can because basically I don't want to be touching it off of the grid, all of this stuff is. Yeah, all of this stuff basically university doesn't want to ever to touch the university servers because they don't want you to be our on their servers. So, yeah. So those are, you know, they're not conceptual answers but their reasons. Yeah, yeah, that's that's really helpful because I mean I know that the IRB guidelines and all that sort of really influence the way you conceptualize your study from the very beginning. But yeah, thank you so much. I think we're out of time but on the behalf of G SAP and the urban planning program in particular like to thank you for presenting today. I mean we really appreciate you taking the time. Yeah, so it was great fun. Thank you. Yeah, and for everyone else, make sure to join us next week at the same time for our next lip stocks talk by Dr. Rob kitchen, whose talk will be on the epistemology practices and politics of urban science and city dashboards. Yeah, thank you. Runjani, am I to stay here to chat with students is that from what I understood from email or am I. I think that I think everyone is having a bit of a busy week so we might need to do schedule that. Yeah, I think the semester unfortunately is in full swing so. Okay. Yeah, but thank you so much. It was fascinating. Thank you. Also the data sources that you had so which was, you know, we don't really get access to that. It was amazing to know what was possible. So yeah, I will, I will, I will say that you can get access you just have to be annoying. Okay, thank you so much for inviting me really.