 Well good afternoon everybody and welcome to this meeting of the Geographical Sciences Committee of the National Academies of Science Engineering and Medicine. We're very pleased to have you here and we also have an audience online today so we have people listening. They won't be able to speak but later they'll be able to submit questions and writing if they want to. So I'm Carol Hardin, I chair the committee, I'm a geographer, Professor Emerita from the University of Tennessee although I now reside in the state of Vermont. So some different geographical perspectives on things. The Geographical Sciences Committee is a standing committee, I think the microphone will work better if I sit down, is a standing committee within the Board of Earth Sciences and resources here at the National Academies. The mission of our committee is to provide high quality scientific and technical and policy advice and recommendations to society and to government at all levels and so we especially like to help federal agencies but we also work with other governmental agencies, nonprofits, nonprofits, private states, whoever needs our attention. We hold monthly meetings and then occasionally we get together as we are today in person and what we do in these meetings is we've been really trying to look and we've really challenged ourselves to look into the future to understand how the world is changing and what kind of questions are raised, what kind of information is needed to make good policy decisions and to have a good future. The people on the committee, I'd like to introduce or have the committee members introduce themselves just very briefly with name and affiliation so you can see who we are. The committee members are some of the people seated at the table here. We'll start right over here with Glenn McDonald. Hi, I'm Glenn McDonald. I'm a professor of geography from UCLA. Hi, my name is Mike Jarrett. I'm the chair of the Environmental Health Sciences and the director of the Center for Occupational Environmental Health at the Fielding School of Public Health at UCLA. I am Nancy Jackson. I am a professor and geographer and at New Jersey Institute of Technology. I'm going to turn on the director of ESRI's Research and Development Center here in D.C. A regents professor in the School of Public Policy at the Georgia Institute of Technology. I'm Bill Silecki, professor of geography at the University of New York, Hunter College. Thank you. And we have one additional member, Tony Bebington, who wasn't able to be with us today. Our topic today is opportunities and consequences of using sensors to capture human geographical behaviors. And this is a topic we're really quite excited about. We start talking about a lot of things and then the whole idea of sensing and information, data, how we how we know how we get data has been coming up and it's just been getting bigger and bigger. There's so many new developments, especially in the mobile geospatial technologies. Anyone is planning to evacuate an urban area during an emergency or improve public health or make cities smarter really needs this kind of this kind of location based data to predict and monitor patterns of human behavior. But the topic is really much broader than that because now we've seen examples of data capture and other kinds of ways digital data captured. For instance, I recently learned in my now home state of Vermont that the economic analysts buy days of data from Visa and are using that they're interpreting those data to understand the impact of second home users in the state and and so periodically we hear of these less direct connections and it's all it's all happening and unfolding at really a rather rapid rate. Today we have a what I think is really an 18 of speakers to lead us further into thinking about this realm of using sensing to capture human behavior. And so at the outset we can see that the technology is rapidly changing that the technology seems to be outpacing policy in this area and that some of these new sensing tools are potential game changers in just how we acquire geospatial data. We also at the same time so there's a lot of opportunity here we also at the same time have some concerns about privacy about the way the data are managed and about how all this will unfold. So our goal for the afternoon is to really better understand what's lying ahead in this area and to stimulate some discussion not just with the speakers and the committee members but with with anyone else in the room or even online who's interested in contributing. So that we can really begin to zoom in and narrow down and articulate what are what are the key issues what are the problems what is it we need to know that we don't know or where the gaps in our understanding. So we like this for us this is hopefully a very productive meeting that will lead to further study and discussion. Our plan for the afternoon is to hear from four speakers and then we'll have a substantial time about 90 minutes actually no no I did the math yeah 90 minutes for for discussion. So we'll have three speakers we'll take a short break we'll come back in and and continue after each speaker we'll have a few minutes to take immediate questions but then we'll have we'll have a much longer discussion time at the end. You should know that the slides that the speakers are using and then actually the audio of our discussion will be posted online at the Geographical Sciences website when they're posted it usually takes a little time after the meeting. You should anyone who's registered for the meeting should get a website with a link so you'll be able to see this. So let's let's move right into our session. Our first speaker will be Dr. Matthew Zook he's a professor of Geography at the University of Kentucky. His research focuses on the production practices and uses of big geodata. He's interested in how code algorithms and space and place interact and especially with increasing use of mobile digital technologies. What I find particularly fascinating is that his work that I've seen integrates flows of material and non-material so the information and material flows at the same time. At the University of Kentucky he directs the Dahle project DOLLY which has a repository of billions of geolocated tweets so I think that gives you a flavor of where we're headed here. Dr. Zook is a Fulbright fellow. His Fulbright fellowship took him to the Mobility Lab at Tartu University in the country of I guess we don't have no audience question. I didn't hear arousing where Estonia that's right. His academic history also includes his as a visiting fellow at the Oxford Internet Institute and as a visiting scholar at the University of Auckland. He has a PhD from the University of California at Berkeley. He's currently a managing editor of the open access journal Big Data and Society. Co-editor of the journal Geo Humanities and a member of multiple editorial boards. I'll turn it over to you. So actually I should probably be here. I was just talking to stand here so it gets recorded right? Okay let me just get my timer here. I timed this to make sure I got right on 20 minutes. Great thanks. And I was saying I told a few people I was a little unsure about how to do like what to talk about here so I decided essentially I'm going to talk about lots of things and so there's going to be a fair amount of material here but the PowerPoints are going to be available. I have to send any slides or papers that people that people might be interested in. And so the charge today was talking about how sensor data this new purging sensor data is changing the way we do geographic research more broadly. And thinking about this and thinking about my own work I'm going to borrow a page from my good child and think about citizens of sensors rather than environmental sensors but really the stuff I've been working with has been social media data. So this crowd source kind of data and there we go. Too many things I might push up here. Okay so this is what I'm thinking about the kind of data we have out here. It's this kind of landscape or sort of digital scape that we have with all this different crowd source information out there that we might pick up. There's all kinds of really interesting microspatial data out there. We can do lots of things. I also say that there's often there's this expectation or the sort of the lure of big data. It's so big we can do anything with it. What I'm really going to emphasize in this talk is that it's not just the size that we're concerned with. It's really just all this social processes these things that are really sort of contained within this kind of data and it's really important that we think carefully about how we use it. What's the framework that we're approaching this data with? I put up this slide. I'm going to show you it later on but this is simply all the geotype tweets in Louisville, Kentucky which is kind of nice. You see lots of maps like this but at the same time it's sort of a map of population density or where people are. You can't really say a whole lot of it. You really need to do work with the data both in cleaning it, formulating questions to make sense of it. The approach that I'm advocating, I'm going to talk about and I'm going to present in like three pieces of research, is trying to move beyond the geotag. By that I mean trying to get a more complicated understanding of space than the simple sort of latitude-longitude. It's easy to do as geographers. It's a common thing we want to make maps. I certainly do make maps along those lines but thinking about you know sort of more holistic you know sort of theoretical approaches to this and sort of how we think about space. There's lots of ways to approach it. I just want to emphasize three today. One is how this kind of activity is very uneven both across space but also across scale, across time. This whole element of relational space. The fact that we have because of these digital technologies, how we use social media, how we move and so forth, we have these relational connections to places that are very far away, at least in terms of material space. And then also talk about sort of the mixed methods of this approach. Building both you know upon sort of quantitative methods. I don't do a lot of modeling in my kind of work. You'll see that in the kind of stuff I'm presenting here. But also how we interrogate and think about these data. So a couple or the three different projects I'm going to talk about. I'll go through each of these fairly quickly. The first one, event-based identification. It's pretty straightforward. The stuff we sort of started with maybe you know five six seven years ago when we started playing around with this are looking at these kind of questions. And the example I'm going to show you is tied to Hurricane Sandy that hit New York in 2012. And it was you know it was just when we started collecting all these geotag tweets. So it actually you know it was an opportune time for us to really start asking some of these questions. And in terms of event identification, this one example that happened in Manhattan. There was this crane that had been damaged by the hurricane. It was dangling very dangerously over the streets of Manhattan. And we just looked for tweets containing the word crane from that time period. And we came up with this nice clustering pattern right around where that event was taking place. For some other interesting things if you actually read through the sort of commentary associated that with that particular event. But it was really you know a really nice example of how you can use this sort of thing to do a event identification. Other things like earthquakes and other sort of events like that can you know you can do lots of sort of things along those lines. And so moving ahead on that we took a sub selection of the Twitter activity in New York during that time. We're using a selection of keywords or hashtags within Twitter to identify tweets that were about the actual hurricane rather than the overall sort of general level of Twitter activity that would normally be taking place. And then we calculated what an odds ratio is similar to a location quotient in that scores above one indicate sort of a specialization in whatever activity you're looking at. In this case we're tweets that we're talking about the hurricane standing was taking place. And one level was very useful and nice to see that we were able to identify a lot of events that were taking place during this time in terms of you know some flooding and some other sort of power outages and things like that. But then we take a look at the same map and we noticed immediately there are lots of stuff that did not get captured by this kind of human sensor data. In Staten Island where most of the deaths were during the hurricane, sewage and other sort of other sort of spills and problems that came out just did not show up in this kind of data whatsoever. So again it's sort of a cautionary tale of you can use this for event identification but really have to think carefully about you know what might be missing from that and there's all kinds of host of problems in terms of bias of who's participating in this case power outages and so forth. The other last thing I want to point out for this for this particular study for Sandy is there's also a real sort of interesting relational dimension and I'm not going to spend too much time on this map or this visualization because it's actually kind of complex to sort of play out but essentially rather than locating cities in terms of actual physical distance we located them from New York based on the disk or based on how many passengers passed from New York in these other cities. So you can see Chicago and Los Angeles by this measure are quite close to New York and other places that are actually closer such as Pittsburgh are much farther away just because of the passenger traffic. And then looking at the level of attention to hurricane Sandy we can see that there's actually this sort of relational network sort of pops up with this kind of analysis that places that are far away but have this strong connection as measured by airline passengers actually had a lot more attention to what was going on with hurricane Sandy. So this is trying to get at the sort of relational dimension of space as well. So moving on thinking about how we might use this kind of data for looking at metropolitan level activities, activity spaces and so forth. Moving back to Louisville and what's known as the Ninth Street divide. And the Ninth Street divide is basically where that red line is. This is a divide that you often see in lots of American cities, lots of cities around the world. It divides one part of the city from another part of the city. And the important thing about this particular divide is this is how people who live in Louisville understand the city. And this is a quote from a sort of a sort of city magazine for Louisville. And this this idea is don't go west of Ninth Street. Don't go to this area known as the West End. People don't cross that cross that that Ninth Street, which is an imaginary, imaginary barrier in lots of people's minds. The West End happens to be a poorer region of Louisville, predominantly African-American. This other neighborhood, which I'll be talking about in a bit called the East End, is richer, whiter, more suburban. So it's sort of a classic sort of divide, sort of urban divide within within this mid-sized city in America. So coming back to this map, I showed you at the beginning, sort of nice and pretty. You can see all this distribution of Twitter activity, but it doesn't really tell you a whole lot in terms of what's going on through in this the Ninth Street divide, just to give you some context as well. So making a real complicated process, simplifying, we did a lot of pre-processing, trying to figure out how we wanted to approach this question. So we really wanted to ask, is this Ninth Street divide actual, can we see this divide taking place within Twitter activity? And the places that people are using social media. So we divided in these two neighborhoods, the West End, the poorer African-American neighborhood, the East End, the richer, more suburban neighborhood, a whiter neighborhood, and assigned users who had at least 50 or 50 or more tweets, geotag tweets in their database, and looked at the space in which they were moving through the city, sort of the space they were inhabiting in the city. There are a couple of other things. I'm not going to go into it just for the matter of time. But the big sort of story we got off is that the story of the Ninth Street divide was true for this neighborhood, the people from the East End. They did not really do any social media posting in the West End. But for the West Enders, the poorer region, this Ninth Street divide really didn't function as a barrier towards our movement. At least they went to lots of other places in the city as well. There's all kinds of reasons for it in terms of amenities, grocery stores, employment centers, and so forth, but it's sort of divide. But again, it was a way to counter this story, this sort of understanding, particularly from this side of town, that the West End was this big barrier to say like, well, it might be for folks over here, but in reality, the city is a much more complex place, particularly people coming from that side, having to cross one direction and another. The other thing we wanted to look at is just to look at how this sort of played out across scale. So we're using hexpins as one of the advantages of social media. It's all point data. You can aggregate it up to census error units and aggregate up to your own units. We use these hexpins of a certain size. But we wanted to start looking at when you start varying the size of the hexpins, what kind of patterns you get, because you have these various sort of uniform looking patterns that are sort of part of our intent and the visualization. We're interpreting, we're narrating this story already for you with these with these visualizations, but we wanted to make it a bit more complicated. So again, just to clarify, the places that are purple, these are West End, West End, places that West Enders are predominantly sending tweets. The brown ones are places that East Enders or the white, richer, more suburban locations are sending. And so you get this really complex pattern when you start looking at different scales. These bigger ones are the ones that you saw on the original map. These are just a scaled down version overlaid on top. And so you can see places that are predominantly associated with the West End that are actually really big concentrations of East End activities. This particular spot right here is the location of what you probably refer to as sort of hipster kind of bars, sort of an gentrifying neighborhood. This is a spot where people from the East End are moving in. There's also some high schools that have historically been attended by East Enders as well. So you start getting a much more complex understanding to this multi-scalier understanding of what's taking place in the city of Louisville using us, because we're again, we're using point data. We can aggregate it up in different ways and get this kind of this kind of interpretation. Also, because we have... Oops. I'm stuck. OK. You also have, because we have a time stamp on all this stuff, we can also look at the temporal difference. And for those of you not familiar with Kentucky, Louisville is where the Kentucky Derby takes place. It just happened a week or so ago. This is actually the location. It's in the West End. And you can see with this kind of this kind of analysis that you have this temporal difference as well, that this spot right here during racing season suddenly becomes a space that is being occupied and used by this East End neighborhood rather than the predominantly West End neighborhood activity that you would see in the rest of the year. So you can start getting both these gradations, these different understanding of space in terms of multi-scalier looks, but also temporal looks as well. There's lots of other things one might do with this kind of this data as well. And this is, we're just, for this sort of stuff beyond recognized or sort of assigning people to a particular neighborhood, we're not trying to get at any sort of interpretation of what they were saying in their particular tweets for the most part. The last bit I want to talk about is this looking at how we might map or use this kind of data to map spaces and networks of economic knowledge. That was one of the things that was listed in the overview for today. Was approaches for economic geography, that's sort of my background or an economic geography by training or economic geography by training. And so this is one of the things we've been doing most recently, looking at these kinds of questions, particularly looking at how we might measure knowledge, particularly kind of knowledge within cultural industries, sort of part of the attentional economy, what are people paying attention to? What kind of things are they're discussing? We're specifically focusing on the fashion industry. This is some work I'm doing with a colleague of mine, Dominic Power, over at Stockholm University. He's that his focus is within the cultural industry. And so using this kind of data on what people are talking about, we can get this sense of where is the spatial distribution of this attention to this particular industry. It's a very specific kind of knowledge of work. Fashion knowledge is different than the kind of knowledge you might see within a sort of tech-based kind of endeavor. But it's still sort of knowledge in social media, a particularly good way of capturing and sort of measuring this. So just to give you some sense of what we're doing with this, we used an interesting source called the business of sorry, the business of fashion, not the business of finance. Sorry about that, study finance, too. But the business of fashion have these indices of various designers, companies, executives, brands and so forth, they publish yearly that we've created a sort of a list of about 950 keywords that we then went back to the Twitter database we had and looked just for references for these keywords in particular places. We did a lot of and I will say part of this is what I get about that again, this need to have this framework. We did a lot of cleaning of this data. There's lots of keywords that we could actually include on this. You know, one example is the fashion brand diesel. That was one of the things in our list that people talk about diesel in lots of different ways. I don't have anything to do with fashion. So we just excluded that from our particular research here. And so here you can just sort of see the sort of levels of attention to fashion in various cities, particularly, you know, some of the big fashion capitals, New York, London, Milan and Paris have a certain spatiality to this kind of activity, this kind of attention. We also looked at how this gets distributed for various fashion brands around the world. This is where we're starting to look more at some of the networks of attention. Where are people paying attention to Louis Vuitton versus Gucci versus Ralph Lauren? And I'm not going to talk too much about these these particular examples. I was going to show these maps just mainly to see the different kinds of patterns. But there's some really interesting interpretations of this. Why are we seeing certain certain sort of clusters for one certain fashion brand versus others just to pick one? In particular, we're sort of really interested in this cluster of attention to fashion within West Africa to Louis Vuitton. And when we went in and actually started reading the tweets, getting an understanding of what people were talking about this, this where we get into the mixed methods approach, this sort of thing. We get this understanding that this is a much more. This is this is a a place that's becoming becoming richer. There's a lot of aspirational consumption here. People are necessarily buying these products. It's not a huge market for this particular brand, but people are talking about it, sort of how they perform a they put on a perform or there's a performance of consumption here. That is, you know, actually not necessarily coinciding with the actual consumption of this particular brand. So I just want to keep those have those three three examples. I think I'm time on. OK, I'm doing OK with time on here. I just wanted to leave with a couple sort of concerns on this. I mean, I think there's a lot of promise, lots of interesting things we can do with this kind of sensor data are citizens, the census, what they're reporting. I think there are a lot of other concerns as well. I'm not going to talk too much about this privacy and consent one beyond. I mean, it's obvious a big issue for for lots of things. I think the one thing I want to point out here is. So far, we really don't have any good standards or protocols for how we go about using this data. It's sort of a bit ad hoc. There's been changing on some of the the rules by which you use human subjects. People who use to who use social media data will often claim or often make a claim to IRB or that this isn't this isn't human subjects because this is publicly reported data. I think this is a very complex thing. It's not something that this has to go through IRB or this doesn't have to go through IRB. These are the kinds of but I think there's an important conversation. I think particularly this committee could have about how do we take on those sort of human subjects kind of question. And again, you know, as we all know here, I'm not going to talk too much about it. Location is very sensitive for all kinds of reasons and we're really identifiable by just a couple of a couple data points in terms of, you know, finding us from the crowd. And so this leads into the next sort of point, which is a sharing of data. And I think this is something that increasingly you see from funders and so forth this mandate to share data that comes out of out of a funded research project, make it publicly available to other people, which I think is a great idea. I completely understand why that that is put out there. But I also really want to raise concerns about making data anonymous is really, really difficult. There's all kinds of examples of people could make good faith efforts and try to anonymize their data only to find out that it wasn't as not as anonymous as they thought. There's lots of other cases of people being very careless of data, sharing data, posting it publicly, some really sensitive data for people. This is something that I think is again, a real concern about how we use this kind of sensor data. And one of the toughest things about this is we don't even know a priori what kind of data is going to be the most sensitive. We might think it wouldn't be a particular location. My favorite example here is the way in which battery uses on a cellular phone or a mobile device varies by location, based on how far you are away from the signaling station. So even something is that something is sort of banal of battery usage is potentially locatable or potentially providing more data than you might one might think otherwise. And in the last point in terms of using this kind of stuff, I think there's some real concerns I want to raise about how using this kind of sensor data, using these kind of data sources also can be implicated in producing social inequality. There's examples in terms of how social networks, I mean, there's various programs, particularly thinking China right now, with some of their social networking applications, in which the content and the activities of your social network is tied to your access to various government services and other things. All kinds of other things in terms of ZIP code being used to shape delivery decisions by Amazon Prime and ZIP code is also very strongly associated with demographics for this real case. But the last point is it really is a point about in terms of what in terms of sensors, what gets measured and valorized is really, really important because if we're using this kind of data to understand human geography, urban geography, what's being measured and what's not being measured becomes a really key question. And so I wanted to end with this last slide by Bill Bungie, who made this map from Detroit in the late 1960s, this map of rat-bitten babies. And I put this out because I think it's really some nice way to encapsulate both the promise and the sort of the problems of this kind of data with this new sensor data. We might have ways of measuring. We certainly do have ways of measuring stuff that wasn't really measurable before and a very good and sort of real time way. But at the same time, you know, will we measure it? Will it end up in the sort of data sets that we're using? I think that's where it gets kind of kind of problematic and a bit lesser. Bill Bungie went out was measuring this because he had a certain story he wanted to tell about what was not being told with some of the other data out there. So I'm just going to end on that and happy to have questions. We could take a couple of questions if anyone has a question at this moment. No question. Andrew, please use the microphone. So I'm curious kind of actually that last one, time that last one to your previous study. So most of the studies that looks like that you investigated were passive data collection, right? The versus the last one, which you said was intentional. There was a question asked, there was a response. Have you seen a difference in the type of data collected, the biases, the other issues that arise when it's when it's passive monitoring versus active engagement through these different social media channels? Are you talking about my third case in terms of the fashion one? Oh, no, Bill Bungie. The bunch of Bungies. Oh, OK. That was an example, because I was that presumably he surveyed people about it, right? And you see a lot of times where agencies and organizations might ask, you know, what's your favorite brand versus just passively listening for branding, back to you, your fashion example. Right. I mean, I would say that question, I think Bungie was actually using health public health or sort of health public health records rather than going out actually and actually serving surveying people. But I do think I mean, that's a yeah, I mean, I think the point between sort of this passive collection and this active collection. I mean, it's the in the in the asking of the question. You are helping shape the kind of responses you're going to get, which I think is less of an issue with this sort of more passive kind of collection. But yeah, I think I'd have to think more about but I don't mean I think it is actually it's a useful distinction for that that sort of thing. And yeah, yeah. Thanks for the nice talk. So you mentioned population representativeness and trying to understand what you're really getting when you look at the social media data. And I'm wondering what techniques you have for discerning what's a random event from something that might be representative of broader social or economic trend. Yeah, I mean, a lot of that, I mean, I'll go back to the example of Louisville. A lot of this comes back to the way you shape the research question because the research question was not whether or not the research question was could we disprove or to the null hypothesis? We didn't really phrase it that way. But could we show that could we show that this ninth street divide is, you know, working one way that we weren't really expecting that. But, you know, when we came at that, the way we came out. But can we show that this ninth street divide story that people and how people are understanding Louisville, can we sort of disprove that? And for that, we can show we being able to show that this part of this population was moving in this various space, we able to disprove that that sort of larger story, that larger understanding of Louisville. And for that particular question, we didn't need to worry so much about whether or not the data was represented the entire population. The fact that some people were doing it was enough for our particular research question. And I think that's a really sort of key part of looking at that. I mean, just to give another example, we're trying to do. We're currently working right now on a study of gentrification, sort of trying to look at how we can use how people, this sort of social media activity, if people are sort of active in this one place and they're starting to become active in this other place, is that indicative of gentrification? And gentrification, again, we don't need to make sure that it's represented the entire population in some ways. It's a very specific segment of the population that are the ones moving into area to occupy areas they hadn't occupied before. So I think it's a long way of saying it's a very representative, and this is a really key question for using the sort of social media. And because of that, you cannot use it to answer all kinds of questions. You have to be very careful about that. Thank you. In your wonderful talk, thank you. In using the geotagged social data, to what extent do you think you're able to move theoretical and conceptual dimensions forward? It's very descriptive. You're looking for natural experiments of some sort. But have you found a way to fine-tune the ability of this data to test hypotheses and develop theory that underpins this data? That's a very broad question, I know. But I thought I'd at least challenge you with it. I mean, I think it's a great question. I think the way we've really been approaching this is trying to, in some ways, test some of the really sort of fascinating and really great social theories that have come out in terms of how people are using the relational space, the multi-scaler kind of thing. And actually, a lot of social theories that have a very sort of rich theoretical development, but have not really been empirically tested before. So this is more, our work so far has been mostly focused on how we can use these sort of data to actually see if the theory building that's been done can be shown to be occurring in actual real space or real in real life. OK, well, in the interest of moving on, I'm going to ask others to hold their questions to the discussion. And we'll move on to our next speaker. Thank you very much. Our next speaker is Dr. Sarah Williams. Dr. Williams' work combines geographic analysis and design. And she's an associate professor of technology and urban planning at MIT. She's also the director of the Civic Data Design Lab at MIT School of Architecture and Planning. The lab uses data, maps and mobile technologies to develop interactive design and communication strategies that convey urban policy issues to broader audiences. So this is really her work is quite a bit of interest to us. She was trained as a geographer at Clark University and also as a landscape architect at the University of Pennsylvania and also as an urban planner at MIT. And before working at MIT, Dr. Williams was co-director of the Spatial Information Design Lab at Columbia University's Graduate School of Architecture, Planning and Preservation. A wonderful part of her story and something we don't often get to say about our scientist speakers here is that her design work has been widely exhibited in venues including the Guggenheim, the Museum of Modern Art and the Cooper Hewitt Museum in New York City. And she's won numerous awards, including having her work on view at the Museum of Modern Art. So we'll turn it over to Sarah Williams. Great. I think we wait to figure out how we can project. I just want to first say how excited I am to be here with you all. I think this is a really important topic. And especially timely. We got the invite before the news about Facebook's data broke. But I think it's a particular timely that we're all gathered here together, have a discussion later today about what does using social media mean ethically for all the work that we do. So I'm excited to talk with you. So thanks for inviting me. Let's see here. We can. We're just going to take a moment and I'm just going to need one moment to get our screens all synced up for both here in the room and for our audience online. You know, I think we have time for one more question for Dr. Yes, please use the microphone. Just push the button. I'm Julie Ramirez. I'm with Smith's Group. One of our divisions makes the explosive detection equipment that's used at the airport and also for screening people like trace equipment. I'm wondering, have you looked at crowdsourcing from the perspective of security? And I know that there's some privacy issues with that, but we're always trying to find a way to screen people at the airports in a way that's not so intrusive and that is accurate, makes it easier for people to go through the airport. So I was just wondering if some of your research has gone in that area. Quick answer is no, we've not looked at that particular question. I mean, I know people have, there are some approaches to it. I have a lot of concerns about it, but I'll just we can have a conversation later on, especially since the things are working out. OK, very good. Well, now we'll turn it over to Dr. Williams. I guess the people online get a preview to my next slide. I think that I think they're lucky ones, right? So I like to start by kind of putting this up there. You know, we all hear that big data will change the world, but I believe big data will not change the world unless it's collected and synthesized into tools that have a public benefit. I run something called the Civic Data Design Lab. And really, our lab tries to take something that looks like this and transform it into images that can be developed into policy. And so this is an early project of the lab, which looked at how much it costs to incarcerate people block by block. And these very red blocks that you see over a million dollars is spent to incarcerate people from these blocks. This zoom in here in Brownsville, Brooklyn, over 17 million dollars was spent to incarcerate people from these blocks. And the idea is to use the data to help influence policy. So think about if we just spent one million of that 17 million dollars for job training programs, health services, proper education, how might we change how these neighborhoods look? And so these maps are the ones that are actually in the Museum of Modern Art and they were part of an exhibit there. And they were actually seen and used as part of the criminal reinvestment act of 2010 to allocate funding to reentry programming. That's programming. It was only 25 million throughout the US. So it's a drop in the bucket compared to this 17 million block. But one of the things that I really try to do in my work is think about how we can use data translated into communication devices that can be used by policy experts inside the field and outside the field to get kind of a broader public exposed to the issues and so we can influence and affect policy change. So if roughly 80 percent of the data that's stored in the world is privately owned, I think we need to think about ways that we can use that data for public good and really unlock that data for policy change. And I have what I call the build it, hack it, share it method to do that. And so I'm going to tell you stories about this in my talk today. So build it. We all have mobile devices on us. Everybody walked in with one. Everybody has a data collection device on their person. Most likely, maybe several. How can we leverage those data sets that we are using to build our own data to create policy change? Hack it. What do I mean by hack it? Websites and social media sites, just as we saw, have a wealth of information that can be scraped and transformed for use in policy. Analysis and I think this is particularly important in areas where there's no other data available and how can we use that data to make decisions about communities where no other data exists? Share it, which we heard a little bit about in the previous presentation, too. I think sharing data builds partnerships with local communities. And it actually allows data to have an expanded possible use. And so sharing data shows a commitment of a project and a place. And it creates trust between citizens, which is really citizens in the government, which is really important for making some of the policy changes that we want to see. So what I thought I'd do today is provide an example from each one of these areas from my own work. And because I really like telling stories with data, we're going to start with Build It, which is in Nairobi. And so Nairobi suffers from severe congestion problems. A typical street image of Nairobi looks something like this. And, you know, the unprecedented growth that the city has seen has not caught up with the infrastructure development. I've been working in Nairobi for a long time, and I actually created the first GIS data set for the city in order to develop a transportation model. I created a land use density map. And one of the issues that I had when I built this original transportation model was I didn't have information on these vehicles, these matatus, these are the small vehicles that are on the roadways, which are the main form of public transit in the city. Actually, there was just a story two weeks ago in the New York Times about matatu culture. They're very well known for having a type of culture. Matatus will have video screens, some have disco balls. There is a Jay-Z and Beyonce matatu. There's also a Van Dam matatu. If you guys have ever seen the TV show Sense 8, the character, one of the main characters runs or drives a Van Dam matatu. And I think what's important here is that these matatas are very much part of the city's culture and pride. But also they're the main way people get around. And people in Nairobi did not have data on that, just like I didn't have it for my model. So I thought, how could I create a raw data set for my model and data that everyone could use? And my research question was how can we leverage the ubiquitous nature of cell phone use in Nairobi, Kenya to capture data about the informal transit, which most citizens depend upon and use that data, open it up so that others can use it. And if you're not familiar, Nairobi, cell phones in Nairobi are used for everything. You buy your coffee with your cell phone minutes. They're kind of an extension of the banking service. People don't have credit cards in Nairobi. They use their cell phone minutes to buy and transfer credit. So this is why we wanted to use the phone. So we created an app with the university in Nairobi that collects data on the Matatus and they actually wrote on the Matatus system. And they collected the data in a format called GTFS. How many people are familiar with this standard? One, two, three, three. That's like a radical like number. Usually it's one, maybe one and a half. So we're at 200 percent now. So GTFS is basically the standard that Google Maps uses. So when you try to map your transit in Google Maps, you're using the GTFS standard underneath. And the reason that we developed this data in the GTFS standard is because we wanted there's a ton of open source software already developed for the standard. So when we open our data up, instantly, there will be a lot of people who can leverage the resource and the benefit of the data itself. Also, Google uses it as a standard. But I think this gets back to the issue is that when we're thinking about some of these data sets, both aggregating them and developing them, data standards are a really important tool for helping make sure data is anonymized and also make sure that it can be used beyond the project that you're working on. So the GTFS standard looks something like this. It's just a series of text files that has a unique identifier. It has latitude and longitude information and has the stop name. And as the data came in through our cell phones, you can see that it began to build the streets of Nairobi. But one of the issues for us is that not only do we want to create this data standard, but we wanted to give the data out for everyone to have access to. And not everyone is going to understand a sheet of latitude and longitude points coming in through a GTFS standard. So we began to think about how could we visualize this data in a way that the citizens of Nairobi would be able to understand it. And so ultimately we created a map that looks much like what you would see in London and New York or Paris. And we did the development of this map thinking about the local geography. So you can see that we don't have every stop on the final map, but we picked landmarks as major stops. The way people navigate in Nairobi is through saying that they're going to Safaricam or Nakimat or a different type of grocery store. And then we invited and throughout the project, we invited the Matatu drivers, owners, the local government, private community to work with us on the data collection. And here you can see the Matatu owners noticing there's this huge area to the north where there's no Matatu's and they're beginning to instantly plan because they can visualize the data. And so it shows the importance of having the data set, but it also shows the importance of those types of people who are actually planning in these communities. And so one of the things that I think is really important is when we're building data sets that we build data sets collaboratively that we ask people to participate in that project with us so that the Matatu owners can trust the data just as much as a government official. This is us working with actually some local transportation planners, again, editing the map and the map was released in Nairobi, it became viral on the internet. We got newspapers to publish it so that people who don't have access to smartphones would be able to use the map. And what I like to talk about now is how do we measure success in the open data project? And I think that's when others leverage your data to create their own policy change. So by sharing our data and sharing it and building it collaboratively, we were really excited to get called to a press conference where the mayor of Nairobi gave it to the governor and made it the official map of the city. And I think why this is important is that even though the government all along the process did not participate or even really acknowledge that we were doing this, even though we invited them, they felt that they were part of the project and so that they could trust the outcome of what we developed. Also, this is the first GTFS data set of a formal transit to be in Google Maps. And what we hope people are doing is using it to make better decisions about transportation in Nairobi. So before, if you were gonna go somewhere new, though you were in kick game, you may fly over and you wanted to go to Junction and you didn't know your route already, you would usually go into town, ask at the Matatsu Terminus and then come back out and now you can make better decisions about how to get around that we hope decreases traffic just ever so slightly. This is a map where looks very similar to our map, doesn't it? Can I tell the difference? This actually was developed by US aid in the World Bank. It's now the proposed BRT system for Nairobi. But again, what I say when you share data, others leverage it for their own change and here they're leveraging kind of the iconic nature of the visualization, the map became very popular in Nairobi and so they're using that to gain and garner support for the BRT system. There are now five apps. This is one called Sonar that use our data sets so private communities have leveraged the opportunity of that data set. This is one app that actually crowdsources the update of the data and there's another one that's built off our data that actually checks the fairs so when people get on they can say how much they paid, fairs are dynamic in Nairobi and so people wanna use this fair app to kind of negotiate when they get on the Matatsu to say like, hey, my friend paid 30 pence for a ride instead of and it includes a budgeting tool but again, the point here is that people are developing their own apps with this data as the base. This work has traveled elsewhere. People have used our tool in Aman, Managua and semi-formal transit really provides mobility around the world and since this project has started we have now roughly 26 cities that have used our tool to develop data sets and actually yesterday I spent some time with the Inter-American Development Bank which is hopefully funding a global network mapping this type of transit data but since my time is limited I'm gonna go right to hack it which is the fun stuff and so how many people are familiar with ghost cities? The idea of ghost cities. These are cities that were built that nobody in China largely but in other places that people did not move to and the idea here is that mapping these vacant residential developments can identify risk in the real estate market. It would be like mapping the foreclosure crisis before it happened but data is really hard to come by in China and so we decided to hack social media sites in China and we developed a model that basically says that a thriving community has amenities and we got these amenities from Dian Ping which is the Chinese version of Yelp and the model is based on the idea that if you have access to a grocery store and a beauty salon that you most likely live in a thriving community if you don't have access to those, you will not so we also then scraped all the residential points of interest from a map which is like map quests in China and Baidu which is Google Maps and then we developed a model that each residential point of interest had a grid cell we calculated the Euclidean distance from the centroid to the nearest amenity taking into account that suburban places people will travel farther for an amenity than places in the city center then we used the reviews to assess whether people actually went to these different locations and then we created, we used the Hanson's Gravitational Model to assess and give an amenity score we did that for all the residential points of interest we looked at those cells where the amenity scores were the highest and then we performed spatial auto-correlation on the cells that remained saying that those areas that had high clusters of amenity scores more and most likely ghost cities so one of the methods that I'm interested in is really when you make models or algorithmic models trying to predict a certain situation I think it's really important to ground truth that data sets and ask the people on the ground whether they believe the model to be true so we went to Shendu, Tengen, and Xiyang and Xiyang, we found a lot of developments that looked like this they were probably developed five to 10 years ago the government would argue that these will eventually be populated even though they're not developed currently we also found a lot of solved construction sites so those around semi-vacant housing or underutilized housing so a lot of situations that looked like this next to then these housing developments that were like occupied maybe by 50% we also found some older Chinese housing from communist error that have now been vacated ready for development and you can see that these are largely empty and then we found that whole ghost cities which I showed you early in the talk that was the drone video footage and this is a whole city outside Xiyang which nobody lives this is actually the science museum or we should have been the science museum and then as I said, one of the things that I'm really interested is then being able to show that data to the community that we're working in so we created this interface which allows you to see each cell what the amenity score is and what is causing this particular red square to be considered a ghost city and the idea here is that this can be used as a planning tool for local planners to actually increase the amenities in this area to actually make it a viable area and then we ask planners to look at our interface so this is a senior planner from the Chinese Academy who talks about a lot how vacant developments are controversial to local politics because the local planners aren't really allowed to make decisions about where these developments are placed and so they feel like even though they know the community is the best they can't actually interface with the local government or the senior government to tell them not to work in those areas or open those areas up for development. This is a deputy director of the Yuhang district or a higher level than Xi Yang and he's talked about how decisions are mostly based on theory without any data behind them. We talked to real estate developers who talk about the burst of the real estate bubble will carry irreversible impacts on residents to buy these houses. So a lot of these homes that are empty are completely sold because Chinese people will have four or five houses and so they're starting to have more and more risky mortgages so the idea is that these vacant areas might be completely bought out but they hold the riskiest mortgages and we talked to academics as well this is somebody from Xinhua University and he said the mismatch between supply and demand geospatially is a big problem and that addressing the oversupply is something that's really looming ahead. So I think ultimately we hope that these maps are kind of a guide to risk in the market and a guide for where improvements could be made to make these more viable products but also it kind of shows the amount of financial risks that the government is holding. Again, we like to in the lab bring these to larger public and this was part of an exhibition at the Seoul Biennial which just closed this past December and we allowed people to interact with the maps and then actually drone footage above matched up with the ghost cities that we visited so you can see hear that. So I'd like to leave you with this idea that we can really build it, hack it and share it and sharing I wanna just say is sometimes sharing through interfaces but also sharing the data set itself and that if we can find tools to share that information we can build these greater partnerships. The lab has a number of different projects that we work on where we really work on interfaces to share data and one thing that we're working on just starting to work on is think about how to make negotiations with private companies to access data for social goods. So actually I was just here again yesterday talking with MasterCard about how they can share data to help with workforce and labor force issues and they're really interested in how that we can deal with issues of social inclusion by using their data set and I think that they realize that they need to do that. A lot of our work also includes telecom data and I think the biggest barrier to working with some of these vast data sets that we have is creating the partnership agreements but creating a policy as Matthew said of what is aggregated enough? What do we consider to be aggregated enough or what do we consider private enough to use these for social good? Because I think each right now each individual project gets negotiated and that determination of what is useful goes through a lengthy process and I think it's largely because we don't have policies around how this data can be used in the public sector. And I also wanna just say, I think that figuring that out would be useful not just in terms of thinking about how we can do this kind of data analytics but it'd be very useful for these companies like Facebook if they had some kind of guidelines in which how to act, I think it could help us use our data as a resource but also help them protect our data. Thank you. Thank you very much. We have a few minutes for questions. If you have a question, yes. Please come find a microphone. Hi Sarah, Mikhail from Mapbox. We've known each other and various other iterations of our lives. Definitely. Yeah, I really like the last points you're making and I'd love to hear more about the framework or the way in which the private sector can take part in developing the scope of more standardized ways to share data. You mentioned Facebook, I think while we've certainly heard a lot about places where Facebook has missteps with their handling of private data, the way they've managed other data sets like the movement data has actually been, I think is almost, there's a model there which, so I think there's an engagement with companies including Mapbox which collects telemetry data and we've had developed a very robust system to keep that anonymized and are very considerate about what we do with it but want to do more with it and make more public good. Yeah, facilitating that kind of conversation is really hard among different companies. Some external, whether it's here or at MIT bringing folks together to have that conversation because we all want to share but we also thinking about the business factors as well in the back of our mind. So yeah. Yeah, I mean, I have a lot of thoughts on this and in fact, I'm working on a project with Facebook right now which I can't share with you which is problematic in and of itself but hopefully I will get there. But we spent a lot of time, I'm doing a project with Facebook to help think about those areas that have the least access to internet and they're actually giving me their data to do that and the idea of using it is to help think about help governments in these areas think about how to get provision of internet use and I think it's a really powerful, obviously Facebook does not have let's say an altruistic goal in that they're looking for their new market but I think it also talks a lot about like the digital divide and how we can include public resources in it but this whole recent conversation about Facebook data has kind of closed the project down and Facebook has become very scared about how to work on it because they know that now we've got this map of potentially locations that we can really help guide governments into helping provide the infrastructure needs in those areas but they don't wanna talk about it. I'm sure I'm probably gonna get in trouble for speaking. My big mouth. And so I think I think having some kind of guideline of how aggregated a dataset needs to be released but then a map is one way that we can work towards creating relationships but I think that also is problematic because you never really have a truly aggregated dataset as you talked about and but I also think I'm creating like certain kinds of standards so this gets back to my interest in standards and kind of standards in which we can release certain kinds of telecom data or certain kinds of social media data so I think a great example of that from telecom community is they release CDR data so that's call record data which is anonymized and can be used a lot for congestion flows and in many ways it's become a standard in ways in which telcos do release their datasets when they do and I think it's been at least in Africa it's allowed bad data to be released more and so that is one area that we can work for is standards on how to release a dataset. The second problem though is even though a telco might have let's say a CDR dataset that they can release and a standard that they think is anonymized each telco has different objectives and negotiating those objectives with let's say a local government is hard so I think we also need to develop kind of standard templates for those negotiations in which we can have an easier starting point and the World Bank actually just announced two weeks ago that they're working with the global mobile phone operators to try to develop some of these standards I think mostly they're interested in developing countries but I think this is just as important in the US context as well and so working with AT&T Verizon and so forth how do you get this data released at the CDR level? I think it's important but right a local government doesn't know the questions that they need to ask so creating those templates are really important. We have time for another question. Yeah, one question. Is aggregation enough or is noising differential privacy methods are those required? Because aggregation alone with supplemental data from other sources usually allows identification. Yeah, right, so like the CDR data also like the way that it's released is not just aggregation it's looking at how much, what percentage of the tower do people move from one tower to another tower? So it's very hard to get a position although people have tried actually and sometimes people can find them but so is it enough by itself? Probably no, which is why it needs to be corresponded with I think standard agreements on how this data could be used. Although this is a very idealistic view point by the way. Yeah, and as you mentioned you talked about jumping from tower to tower most traditional telco data stopped at the tower but G4 and beyond has sufficient range that the tower no longer locates an individual but on the other hand the information that can be captured at the phone by an embedded app does an excellent job. That's true, you can find an individual pretty easily with telco data. Maybe we'll end this section on that. Nope, I don't have the food for thoughts for food to make us anxious about what's coming next. Thank you very much. Thank you. Our next speaker Dr. Michael Jared is a member of the Geographical Sciences Committee. He's professor and chair of the Department of Environmental Health Sciences in the Fielding School of Public Health at the University of California in Los Angeles. There was another piece that you mentioned when we went around the room, you were director of. Yeah, center for the director and center for environmental occupational health. Yeah, he has a full plate. He's a specialist in GIS geography and issues that relate to health and exposure. So his research has been involved characterizing population exposures to particularly air pollution but also other variables of the built environment. And also he assesses the health effects of environmental exposures. He's also been involved in studying activity levels, human behavior and obesity. He has a BS degree in environmental science from Trent University and both graduate degrees from the University of Toronto and MA in political science, environmental science and a PhD in geography. Are you ready? Yes, thank you. Okay, good. Thanks Carol for the kind introduction. It's really a pleasure and honor to be here to share some of my thoughts on how we can improve exposure assessment and better understand how the environment influences human health. As a geographer, I want to give you a mental map of the talk. So I'd like to start by talking about the Hager-Strand concept of time geographies and the lifelines of exposure. I then want to shift into how we're really living in a UBECOMP world and I'll explain what that is. And then I want to give you some applied examples with air pollution, built environments and physical activities in three different types of sensors. Those that are on cell phones themselves, those that are standalone but use the sensors as a connection to the cell phone and then embedded sensor networks. And we've seen some examples of these already. And then just give you a few key concepts in future directions. So when we look back at the concept of time geography, it's critical I think to understand that the physical time space paths are the dominant determinants of human environmental exposure. And for understanding the exposure, being able to understand the time space dimensions and the geography of that is absolutely critical. And we can think of exposure in one's life course or what's sometimes called the exposome as the summation of travel through these various hazard fields. Here's just a schematic that was developed by Shaw where we have an individual going through time and space and eventually we can characterize their potential activity space. And it's really this area that we're very concerned about in terms of their exposures because that's where they're going to spend the majority of their time. If we look at this nice bricks and gull over plot you can see that you could think of this as travels through different pollution surfaces or different hazard maps. And it could be also salutinogenic exposures such as green space, which are good for your health. But we also want to think about the physiology and the activity level at the moment of contact is what we know about environmental exposures is they often have very strong microgradients but also it's important to understand the human dimensions of that contact as the person goes through the exposure field. And this is from a scripted exposure study I did in Berkeley several years ago where we had people drive down low and high air pollution paths. And what you can see here is that on the high air pollution path that the ratio is something like 543 times from the highest levels to the lowest. I think these are actually for clean place like Berkeley this is one of the highest levels of ultrafine particles ever observed in the world over a million particles per cubic centimeter. You can't see them, they're microscopic but they have been associated with a wide range of adverse health effects. Even on the low exposure route our ratio is something like 135. So very dramatic shifts at very small microgeographies of exposure but also important is the inhalation rate that the person is experiencing while they're in that hazard field. So somebody sitting in a bus or cars inhaling about 4.5 liters a minute if they're pedaling really hard on their bicycle it's 37 liters per minute. So you're looking at something it's about eight times the exposure and then you multiply about 530 or 540 times that momentary inhalation is going to be incredibly high for some individuals depending both on their location and on their physical activity. So I think it's been recognized for a long time that time geographies are important but they've remained largely a theoretical construct more than an empirical reality or we've attempted to characterize them with simulations that are often built on weak and probably unrepresentative data. And I think for the first time we're really seeing the new technologies are offering us a realistic possibility of getting at an understanding on large numbers of subjects the time geographies and the microgeographies of personal exposure and I want to emphasize the personal because it's not just the broader populations we're interested in we look at epidemiological studies if you want that exposure to be as precise as possible. So enter into the Ubicomp world which was a vision of Mark Weiser a great computer scientist the IBM labs in Stanford. Way back in 1991 she predicted that we'd be living in a Ubicomp world so it's a complete embedding of computational technology into our everyday lives and we've seen that we're all carrying one right here. And it's being driven I think when we look at the important innovations it's not coming necessarily out of the environmental health field or the geography field but the healthcare sector and telemedicine is projected to be a $10 billion a year enterprise in the next few years and other commercial applications that we've seen on Google and other mobile phone devices. So we now have seven billion cell phones in the world this is about a four month old statistic so it may be even more so one for every woman, man and child on the planet and about 1.5 billion of those are smartphones which have computing potential that's much greater than what I used to use when I was going to graduate school. I don't want to date myself too much but certainly a very powerful mobile computing platform that's available globally and it gives us this potential now to understand what is affecting health in terms of exposures and I think we all use a lot of the sensors that are embedded on our phones but when we look at what's there they all have cameras, a global positioning system but they have barometers capable of measuring heart rate saturated oxygen in the blood ambient light, proximity to each other and this all goes into a fairly powerful central processing unit that has some storage, communication modules and a display which is what we're most likely to look at but how can we use these sensors on the phone to get a more realistic assessment of personal or individualized exposures? I think it's useful when we're thinking about this to think about opportunistic versus participatory sensing so there might be something where we just get an app loaded on a phone and track a person and we'll see in a minute that we can get a very good assessment of their physical activity the trip mower, their walking and biking, their proximity to others if we take that another step further and ask for some participation we might be able to understand things about their psychological health their mood, their affect for elderly people, their gait and then we can get into having noise assessments ultraviolet exposures blood oxygen heart rate we could have these other sensors that are going to be connected and they're using the cell phone as a communications device but they're running on their own or standalone sensors and importantly here biological functions are becoming important really something that you can detect with high precision so let's take a look at a few examples of that but I guess one thing I also want to emphasize as we move down this scale the burden goes up so our possibility so we're going this way and down higher burden so we're not going to get somebody to carry a pollution sensor for the whole year or for their whole lives we can get them to carry a cell phone I think this emphasizes the importance of linking up a lot of the information we can get from these ubiquitous technologies with other types of models to better understand the exposures I know this is a very crowded graph but these are the 25 most popular smartphones on the market as of about 2017 and we just looked at all the different types of sensors and they all have accelerometers they all have GPS most of them have gyroscopes so they can tell orientation of the ground but you can see there's a lot of variability and there's also large variability in the quality of the sensors that are embedded into the phone so we want to look at getting large population information this really complicates the assessment because our apps are not going to necessarily operate the same way on every phone so several years ago with colleagues in Barcelona, Spain and in Southern California I set out to try to understand well what kind of information can we get off the phones and what can we do with that information to improve exposure assessment and it started with a fairly modest pilot study where we had people carrying a cell phone drive app that was created by my colleague Edmund Sito that measures physical activity and geographic position and then we had them wear standard instruments and then we had some fancier measurements of galvanic skin response and we had them carry this for five days and basically what we were able to do is take existing pollution exposure maps and then use the cell phone to get the walking or biking the so-called trip mode classification and then use ad hoc studies to look at indoor outdoor penetration rates of air pollution and then we had hourly and daily specific pollution levels from government monitors and that allowed us to derive a personal exposure to air pollution but when we took the information off the cell phones for physical activity we could actually convert that into pollution inhalation and it revealed some really interesting patterns I think so here's one day of our volunteers on a map and the metabolic equivalents of activity so the darker colors are showing higher levels of activity a rate on this nitrogen dioxide map what you can see is that there's a lot of variability again in the pollution surface but a lot in their physical activity as they move through the city going from very sedentary to very active and when we looked at this and we wanted to say well what's the important environment for pollution inhalation or what one of the most important environments we've always assumed it's the home address so we assume that the people in our studies sit on their front step they sit there for 24 hours a day they never change their activity level and they never move anywhere and then we assess the association and we spend billions and billions of dollars regulating air pollution based on that but we found that about 6% of the time and our volunteers were spent in transit but that accounted for about 11% of their air pollution exposure but because more than 50% of the people in Barcelona walk to work at the time when there is a high exposure because it's during the rush hour that little 6% amount of their time budget was accounting for more than 24% of the total inhaled exposure so that's an actionable area that we could look at to improve public health now we did a larger study funded by the National Institutes of Health in the US and this is for PM 2.5 so it's a more spatially homogenous exposure but we see basically the same pattern that it's 9% of the time in traveling or a larger sample and it's more than, it's about 20% of the exposure and then this other part is what I call a tapas bar effect because people in Barcelona spend a lot of time in tapas bars and outside and those also tend to be beside road size as well so now what does it mean in terms of the average exposure if we just assigned it to the human breathing machine sitting on the front step it's about a 20% increase when we actually take their activity into account and that's enough to really push the needle on the study if it's going in directions that we don't expect the other thing that I hadn't realized when I first started working with these data is that they're huge and they become huge very quickly so this is just the GPS trace of our 174 subjects getting pinged every 10 seconds with two sensors and we have more than 10 million almost 11 million observations over the course of a week the National Institutes of Health asked me to come and do a talk for them because they have a new cohort of a million people and they want to equip them with cell phones and I said, well you're dealing with 61 billion observations a week and it's messy data it's not clean data so when we look at some of the lessons here I think the location of physical activity are critical components because they do represent the time geography of exposure and we can use them to get a better lifeline of an individual's exposure and these can significantly improve the estimates when we move to pusing them with existing models but the data are very big, messy and there's a huge amount of work to deal with and the sustainability of the applications Edmund and I found out that we're really good at doing this kind of research but we're not business people so we didn't want to maintain something on the iPhone app site so what we're doing now is we're working with commercialized data so that we set up an application interface for users they consent with us and this is part of an ongoing study around UCLA where we're evaluating a bike sharing program that impacts on physical activity and travel behavior so this moves application allows you to track the mode and it can also show you a rough calculation of the route so we start to take the data out and this is from one person over the span of about eight months you can see that it becomes very dense it's a very rich array of data and what we're able to do is start to characterize that time activity space so here's the 95% ellipse of where this person spends most of their time and this shows their activity levels where they are so if you're really interested in designing environments that are gonna promote physical or an active lifestyle you can start to see where exactly the people are getting their physical activity and this happens to be the route to work and the person is a walker but there's a hill there but you can see that that kind of information both for understanding inhalation exposures but also understanding how to promote active living could be incredibly important now we don't know how good this is compared to our standard research instruments so we're doing a validation study it's still underway I expect that the data is gonna be messier than what we get our research instruments but it's still gonna be usable and it's gonna give us a lot of information so let me shift now to an example of using external sensors to characterize environmental conditions directly this area is moving so fast right now it's hard to keep up with it but I'll show you a recent study that I published it's based on a sensor that was developed by Rod Jones and his group at Cambridge University in England it's pretty compact and pretty light and when we script people through various environments or we put them into free living conditions the blue is the Cambridge sensor and the red is a much more expensive of a $5,000 research instrument that's much heavier and bigger you can see we're really detecting the patterns quite well we're getting some bias our numbers tend to be a little bit lower but that's something you can deal with statistically I think really importantly when we compare this to the say this is the nitrogen oxide exposures compared to an urban background area that's got a fair bit of traffic in Barcelona when people are by the Mediterranean Sea or they enter into a park their exposure goes down dramatically compared to that background when they're in a low traffic area it goes down a lot but then when they sit over a major truck route on a bridge you can see that exposure really goes it's up to a factor of five greater so the instruments are able to detect these micro-environmental exposures with a fair degree of precision and that's really an important component of what we wanna do when we're trying to understand exposures they do have decent correlations but there's bias and reliability and really extensive post-processing that need it and currently I'd say for very large studies these are not feasible but for smaller studies with lots of resources you can get good information so I mentioned biological monitoring and this really feeds into telemedicine all of these parameters are possible to be measured now on cell phones or with instruments that link to cell phones and I've been working with a group out of San Francisco called Propeller Health my former postdoc Jason Sue works for them and they've developed an FDA approved application which puts a global positioning system on asthma and rescue medication so it links to the phone and it's been shown to improve compliance with asthma medication usage but it also gives us a geographic timestamp and point wherever the person has use of this rescue medication and we've been linking that to air pollution exposures and this is in Louisville, Kentucky with over 1,100 people tracked for several months and this is when the ozone levels go up high and ozone because of the atmospheric chemistry tends to be higher away from sources so as you get into suburban areas and you can see that when the ozone goes up you're getting this instantaneous increase in the use of medication in contrast, following almost a mirrored pattern nitrogen dioxide tends to be higher in downtown areas where the traffic is very dense and you can see that the asthma puffs are really coming up when that NO2 goes up in the downtown area so this biological monitoring offers a great potential to understand the biophysical response to environmental exposure throughout the lifeline and I think it's going to lead to much greater capacity for us to causally link these exposures to health outcomes because the time difference is so small and let me go to a final couple of examples here that shift away from personal exposure into embedded or ubiquitous sensors and here I'm going to echo what my colleagues have already talked about the citizen participation being a very important component and citizens are often very interested and attuned to the environmental exposures they face and it gives them a lot of motivation to help and I think for researchers and governments and non-governmental organizations it's really a huge resource that remains relatively untapped so for this study we're going way down to the southeast corner of California up on New Mexico in Imperial Valley it has the highest asthma rate in the state of California and a lot of social deprivation this is the launching of what I think was the largest and first community monitoring network in the world they even brought out the band for us it was kind of fun and what is important here is they burn their agricultural residues so you get these Beijing-like conditions where the levels go up 10, 15 times higher than the EPA standard over 24 hours they'll only do it for a couple of hours but the potential health impacts are very serious for respiratory health and they actually when conditions like this occur they have to close the roads because there's no visibility this just shows the existing government monitors and gray that were for them and then our array of 40 monitors and we're also by the salt and sea which is losing a lot of its water and as a result there's a lot of dust storms around the area as well so you can see the spatial coverage that we're able to capture compared to the government monitors it's a factor of 10 greater we're measuring every minute the government monitors are measuring every day so the increase in the information on exposures here is enormous and we relied a lot on citizen volunteers or people that were paid you know one person out of the three was paid and the others were friends who were helping out and they did a terrific job helping us set up and maintain these monitors and what it allowed us to do is to characterize very small micro area variations in those exposures in relation to agricultural burning so that we were able to start working with the schools and setting up a system so that we can help protect children when the exposures were going to be high near their schools and you know this just gives you some indication of how inadequate the existing government monitoring is that we detected if we use the 35 micrograms per cubic meter over one hour over about 11 months we detected somewhere in the order of 1426 episodes where fine particulate matter exceeded EPA's 24-hour standard but we use this just as a cutoff to show that what we're detecting with our monitors is more than double what you would get from the government website or the government site so there's a huge increase in the opportunity to warn the public about health risks and to understand the sources of those risks and I think when we look to the future we're now seeing the citizen science experiments explode so there's something called the Purple Air Network which has thousands of sites all over the world and I'm very happy to say that the California government just passed legislation I think partly based on the existence of the the monitors that we had to encourage and basically enforce the Air Resources Board to set up community monitoring systems in every major air district in California so that bill is just being implemented now so I think when we look at what we've learned from this the temporal coverage could lead to much better predictions of exposure for both epidemiological studies and public health protection and we think about combining this if we had an ongoing system where we knew where the people were and we had the propeller health bringing all that together would give us a very powerful public health surveillance and protection system it also again the data analysis here extremely laborious it's very messy data and I'm sure my colleagues who've done the presentations know about the misery of dealing with all this so these are slides that were shared by Aaron Hipp, the University of North Carolina with me and he's doing some really interesting work which I thought I should highlight because I think it has a tremendous potential but there's a lot of video surveillance collection going on around the world and this can be very good for tracking natural experiments so does someone's behavior change after there's a reconstruction of a park or a change in biking or pedestrian infrastructure and he's relying on something called the archive of many outdoor scenes which has more than a billion video files and this just shows an area where in Australia they were reconstructing a pedestrian boulevard so that it started like this and then moved to this and Aaron used what are known as Microsoft or sorry Amazon Turks so people that you paid actually count every person and what he was able to show is that the pedestrian flows this is the before the renovation of the boulevard had gone up dramatically afterward so they were hoping that this kind of modification of the built environment would encourage more active living and it did and we have fairly definitive proof when we look at the pedestrian flows that he so carefully tracked so where's all this going? This is from a project that I was an advisor on for five years funded by the EU called CitySense and it was all about how you could bring all this technology together to help citizens better understand the environmental health risks they face to help governments better deal with those risks and for citizens to become perhaps more environmentally attuned so we're going to see a lot of embedded sensors on all kinds of individuals, bikes, buses we're going to have this volunteer geographic information that's going to come both opportunistically and through participatory sensing it's going to go and it's going to be combined with GPS data into a cloud server there's going to be data feeding in from other types of models like those air pollution models I showed you and ultimately we might have it so that a person wants to go for a walk or a job and they're going to get nearly real-time information on the types of pollution and exposures they might face along the route so this is the sort of ultimate vision for bringing all of this information together now I'm going to echo what some of the previous presenters have said privacy issues, data ownership and protection it's the wild west out there we really don't know how to deal with this it hasn't been well handled and I think that you know going forward as we've seen with the Facebook crisis this is the kind of thing that can shut everything down if it's not properly managed so there needs to be a lot more attention to that I think also we haven't really thought about enough how to foster participatory sensing in large populations so their samples are not going to be completely biased and that's something that you know whether we do more opportunistic sensing so we have apps loaded on the phones and get that information the information is being collected anyway and you know we have already given up much of our privacy to private sector entities and I think what Sarah's mentioned you know getting better cooperation with them to share the data is critical analytically the data are a nightmare they're messy there's a lot of missingness there are massive amounts of it which makes the processing computationally intensive we need to think a lot more about how to integrate existing models with the measurements because we're not going to be able to do everything with individualized measurements for some time to come and we have to turn our sort of influential statistics on their head and instead of thinking about how to get a representative sample of a larger population from a small sample how do we get a relatively representative sample of what the parameter we're interested in in this case exposures is from a sample that's so gigantic and biased that we don't really have the methods to deal with that and we're working really hard at UCLA on how to do that with advanced machine learning and Bayesian models so I think you know in conclusion location and physical activity is essential for linkage to estimate in the time geographies of exposure and high quality information is now available from smartphones the other sensors are showing promise but there's a lot more need for evaluation and validation the new analytical techniques they're going to be essential for processing understanding the huge data streams that are coming out of the sensors and just to conclude on similar topics to the other speakers you know the privacy and the ethical issues and the data ownership are absolutely critical if we're going to go forward because we don't want to have all the beneficial things that we could learn about public health and economic development being shut down because we've been sloppy on this this front so I want to thank all my funders for people who shared slides and most of all I want to thank you and the committee for having me here today thank you we could take a question or two and then we're going to take a short break so does anyone have a question that needs to come up immediately I see break peaking out of people's eyeballs here okay well we'll take a break we'll reconvene at three o'clock and and move forward and then you can send your questions for the discussion time then we'll get started for the afternoon session okay good good good well we're off to a great start so I'm really looking forward to our next speaker's presentation and then our ensuing discussion with our anyway Kevin Pomfret is a corporate partner at the Williams Mullen law firm he co-chairs the firm's unmanned systems practice group and the data protection and cybersecurity practice group he's also the founder and executive director of the Center for Spatial Law and Policy in the mission of that center is to educate lawyers, businesses, government agencies, policymakers and regulators on the unique legal and policy issues associated with geospatial technology and associated data so I think he's the he's the right person for this moment in our in our meeting he counsels businesses and government agencies on policy and legal issues that affect the collection, use storage, and distribution of geospatial information such as licensing privacy and data protection data quality and liability and regulatory matters he regularly speaks on this issues and on these issues and he has presented to committees of the United Nations and the U.S. House of Representatives he began his career as a satellite imagery analyst helping to develop imagery collection strategies and to identify requirements for future collection systems he's a member of the U.S. National Geospatial Advisory Committee and academically he's a graduate of Bates College and Washington and Lee Law School so thank you for coming thank you and I note to myself when you're speaking 20 minutes don't send a 15 minute biography I apologize for that one of the challenges for me as a lawyer is after giving you know sitting through some excellent presentations with a lot of visuals and maps and charts and things you know I try to interject pictures and things in so this is my story it's not the greatest but it starts and I put this in there and I apologize for the spacing I didn't notice that but just to show I've been involved in the geospatial community now for 30 years and a lot of different capacities starting out as a satellite imagery analyst I was the founder and I am the executive director of the Center for Spatial Law and Policy I work with a lot of companies who are dealing with many of the issues that we've talked about here on the industry side and can articulate some of what those are as well with the overlap that I see and then I work with the United Nations Global Geospatial Information Management Initiative on some of the issues that they're working on on the global level and just two thoughts that I wanted to say before I jumped in one is I have sat through so many conversations such as we've had here that in the academic and the research community but also in industry and in government agencies everyone is struggling with the same set of issues the nomenclature is a little bit different and I'll articulate that a little bit in my talk but the issues are the same and that's why one of the reasons I started what I can the center and this area of law around geospatial law and policy because I do think it is something that's got to be increasingly important okay so one of the aspects what many of you have heard the concept of a geospatial ecosystem for me that means government industry and what I can consider citizens of the crowd or research universities how you want to describe it are both data collectors and data users often simultaneously and we've created this system where everyone is sharing data collecting and sharing data and that's fantastic and it's preventing presenting a lot of opportunities we talked about here some of the speakers mentioned but it also has this feeling that if the laws and policies on one interact or if you try to regulate one aspect of it you're likely going to have a ripple effect through all the others so I ask you to think about that as I go through my talk because when people are concerned about how industry uses data how Facebook uses it for instance it's going to impact how the research community and how government agencies are going to use it as well because it's geospatial information and part of what we've learned here is it's very very very versatile and so it's hard to restrict it for one purpose and not another I also just from a context standpoint we talk about laws and policies I have norms there at the bottom I consider those to be sort of ethics because a lot of communities talk about ethics but we have to put this in the context of treaties and policies and contracts and agreements I mean this is there's a whole legal and policy framework not only around geospatial information in fact that's developing but in many areas of technology right so we need I think it's important to think of this and holistically in terms of all the different ways that the work that you're trying to do here could be impacted so what are some of the unique aspects of geospatial information from a legal policy standpoint we've talked about some of them already privacy in terms of the ability how the data is used and how you can use it to identify someone data quality from a as a lawyer I think about it from a liability standpoint intellectual property who owns it and who has the right what you have the right to do with it I think the defense and intelligence roots of many geospatial technologies are important as well and I'll talk about that a little bit but people from those communities first of all tend to have a very strong say in the government particularly around those issues and they have a background and experience that kind of influences how they look at things and then there's a growing set of sector regulations that are impacting how people are going to be able to use it so I work a lot with drone companies there's a body of law that's developing around drones there's a body of law that's developing around satellites right so for users they don't care where the data is from they just want the best data they have but there's a set of laws that you're going to need to follow and understand if you want to use that type of data so just real briefly something the center worked on several years ago with the UN GGIM we did a survey of about think about 50 nations responded of 180 or so countries and we looked at what impact were these legal and policy issues having on their ability to collect use and share geospatial information and I think that the results surprise some folks and within the geospatial community and what this chart here shows that for privacy concerns upwards of 80% in terms of distribution said it had some or a great deal it had an impact it had an impact a little over 40% for example for distribution it had a great deal significant impact on their ability to distribute data so privacy concerns within that country was having 40% had a great impact on their ability to distribute data we looked at liability issues you'll note that these are a little bit less than they were for privacy I should point out again I'm a lawyer these aren't scientifically put together I mean it was very random but we did get a pretty good sampling of people again it was like 45 almost almost 50 I believe that responded and it sort of highlighted the concerns that were out there and these are in many instances government agencies that are collecting and using geospatial information their lawyers are not very much of a help they don't understand what they're doing they don't know who to turn to but they know that there are laws and policies and regulations in place that they have to worry about national security concerns and then the last was licensing and data sharing and I want to point that out because we've talked a lot here about the ability to share data with Facebook with researchers and government agencies as a lawyer I consider that to be a license and I'll articulate why because a license can take many different forms but it's still the person who owns the data is still has responsibility with it and that raises some of the issues that we've talked about here as to why it can be a challenge to share data these issues impact all the stakeholders in the ecosystem government agencies industry universities everyone is caught up and struggling to deal with this I've been fortunate enough to give variations of this talk around the world and governments around the world are all dealing with it now they have the law is still one thing that's tied to a location so I can share data seamlessly internationally but what I can do here in Washington DC with the data I may not be able to do with the same data set even in Minnesota or in Taiwan or in China so we need to keep that in mind the legal and policy communities are still struggling to keep up with all these technological innovations this is a pretty standard representation of how a geospatial product is created as a lawyer I look at this and maybe this is you know something something that's wrong with me but I think about who owns all those different layers and what rights do they have do what rights you have to be able to use them and bring them together now with some instances they may be open data but even then if it's an open data set there are still license associated with that if it's proprietary data if it's from another government agency or the restriction on it so all those layers that's a you know perfect representation but to me it highlights some of the legal issues that we face in this community intellectual property considerations ownership copyright I tied this with scrapings we had some conversations about scraping a website there's a there's still an open issue as to whether you can go and and collect data off a third party's website and not without getting their permission now Sarah and I talked and she she calls up and and and lets the people know that she's doing it which is which is the appropriate thing to do there are a lot of people who do that without without doing it without getting permission and there's a lot of different laws that people are trying to claim because you can't do it now and copyright being one and as I'm sure many of you know the copyright ability of facts is very limited but there are other restrictions in place that people try to use whether it's to be the terms of use terms of service there's some computer fraud cases out there so these are still being worked through and that's something that I think is going to be really important to this community what constitutes a derivative product if you go back to this if you add the different layer sets here at what point does this become yours and not the person who gave you the data and from an intellectual property standpoint that's going to be really important if you're going to try to commercialize this product or if you want to be able to use it without being challenged by someone who who claims they have commercial rights and then one that I talk about and when I talk to other people lawyers and people outside the geospatial community metadata and the importance of metadata to this community it's it's much more important to this community than any other community that I've been been in and there's probably some that I haven't talked to but it but it's so valuable but when you look at agreements when you look at how data is shared very rarely is metadata mentioned as something that is being contributed and whether the representations and warranties whether the promises whether the liability issues are associated with the metadata and I find that really interesting because it is so important but I think it's partly because people who are drafting these agreements tend to be lawyers who aren't familiar with the geospatial community and the importance of metadata or even what it means in this context so some people some people in this in this room like to talk about open data as an ability way to deal with this and and I agree there's a growing number of organizations that are trying to develop open data licenses similar to open software licenses and they're trying to put their data out there and they're trying to limit the restrictions that are associated with the use so that people can use it more freely but many of these these open data licenses still have differences in their terms and these differences for a lawyer can be really significant is that when it was smaller companies certain research organizations NGOs that were using this data they didn't tend to get bogged down in some of these legalese because the mission that they were doing is so good because the business issues were so important that they weren't worried about the legal issues but I promise you because I've gone through this process now when big enterprises start getting involved and they start trying to use open data sets and you're trying to trying to provide as a data as a service they have their lawyers all over it looking at it what does open mean what restrictions do I have is a chair like can I commercialize it what jurisdiction laws apply all these things that lawyers are taught to look at and now they're getting asked to sort of get involved in this geospatial community because their their business folks are doing it and they're really struggling with it and I've seen an impact a number of uses and Sarah mentioned it as well it just you get lawyers involved and you start to slow down the process a little bit so I throw that out there and then partly because the open data licenses are not designed for the geospatial community they're designed for it by the open data community which is important but a lot of them are designed for transparency or for government use and not for some of the uses that associated with open with the geospatial community like geocoding is an example and how you define what constitutes a derivative product in a geospatial context some of you've heard me talk about this probably more often than you care but in my view creating open data is a policy issue but using open data is a legal issue and that's something that you know I think organizations are increasingly increasingly realizing so it's going to make a joke that this the Jeff Goldblum and the dinosaurs the only thing that kept going from the original Jarrett Jurassic Park series but then I realized that was a little bit of a repetitive which is a joke on Jeff Goldblum but anyways I I find this quote really interesting from a data standpoint because I think like like life in the movie data find its way in an organization you bring data in someone uses it for a particular purpose but someone else within the organization finds another application with it if I add it with this data set I can do something else or if I can just collect this additional data set I can do something else which is fantastic right we're still at the cutting edge of these things and that's and that's great but organizations don't necessarily know whether the data has been collected for that particular use whether it's suited for that particular use and how it's going to be used does it have the quality control in place all these things that if you consider data an asset which which I do that needs to be you know the value needs to be protected and also the the risks need to be mitigated that's something that I don't think many organizations are thinking about now particularly the legal departments they're not familiar what's going on and I can tell you that and I'll we'll talk about this on the privacy side but I've been doing a lot of work money of you are probably familiar with the general data protection regulation the GDPR that Europe is has put into place starts May 25th and is requiring a lot of U.S. companies to figure out what data they have that has personally identifiable information because of the way the law is written they are arguably caught in it many organizations are really struggling to figure out what data they have and where it's residing so to think that someone knows how it's being used internally in these big organizations is a real challenge geospatial community you know I say was big data before big data was cool but a lot of other organizations aren't and they're trying to figure all that out so geospatial information is versatile but it may not be suitable for all uses accuracy precision completeness all these different factors that this community knows very well but is being pushed down into organizations that they don't necessarily know and then when you start getting machine learning involved in artificial intelligence all these decisions are being made not necessarily by people who understand this technology and how it's being used as a lawyer I think there's a risk in the liability associated with that and how do you mitigate that risk how do you allocate those risks between the parties you can do that through contracts you can do that through laws and regulations in some cases insurance the courts can decide and the one thing and I've done some teaching in this area and one thing I continually ask people think about who is in the best position to decide between the parties as to if something goes wrong if the data quality isn't right or if it's used for a purpose that it's not suitable for I would argue that those are the parties that are involved in many instances if you let the courts or legislatures decide so what organizations can do you know in the U.S. the federal government has sovereign immunity in certain instances I think the use of standards is a great idea I used to be on the board of the open geospatial consortium and I think standards are a way if you can't you can tie back to this particular standards and you get a certain sense that this data is a certain quality or that's been collected in a certain way internal procedures contracts and insurance are all things that every other area of technology business units however you want to describe it they do intuitively I think this community still struggles with and that's something that I think is going to have to change for some of the reasons that we've talked about here so perceptions of privacy are from a location standpoint are changing this is a picture from 2005 in St. Peter's Square in connection with the death of the of the pope at that time I think it was the election of the new pope and they do what we people did in 2005 it just sort of stood around and waited maybe they had a newspaper a magazine but they were just sitting out there waiting right here's the same perspective eight years later and these people are doing what people we do today right we take pictures now it's a little bit of a different context because people are taking pictures someone was going by but that's the time frame by which these smartphones and all these applications are started to develop and people started sharing their location and started tweeting where their friends are and taking pictures and all these different things and the data is being collected and the sensors that we talked about earlier all made available fantastic the legal and policy community has not cut up with this right we're still trying to figure out the Second Amendment right so we this is going to take a while and the challenge is that you know you're developing new applications you're going forward and all these other things we're we're we're trying to catch up with this and people I would argue and this is why I refer to location privacy paradox they expect their protection their privacy from a location standpoint should be more protected now but now because they read about it in the newspaper they starting to hear all these different things they start to they start to worry about it more so they expect their location to be protected and so the onus is on people who are collecting using these apps to to start thinking about it I refer to privacy as the bucket where we place things when we feel uncomfortable because I because I because I see so many times people say that there's a privacy concern associated with this and a privacy concern associated with that and a lot of the time is with new technologies and I would argue in many of those instances it's not a privacy concern it just makes us feel uncomfortable but the problem is privacy is a term of art and there are privacy lawyers that will jump on and the regulators sort of jump on it so geolocation information geospatial information is in the middle of that bucket right now and this can be discussed in a lot of ways and this community will definitely be impacted by that so there was a report in 2014 and for the first time and it came out of the White House it was the President's Council of Advisors on Science and Technology and for the first time I had seen reference to traditional geospatial technology as raising privacy concerns they were they referred to the concept of analog data being converted into born analog data being converted into digital data and the privacy privacy concerns associated with that and to me it was really eye-opening that people were starting to think about these terms and I we haven't gotten that far yet in terms of this since 2014 from a legal and regulatory standpoint except with drones drones are at the state level states are very concerned about this and they're starting to regulate and privacy concerns associated with what data is collected so there's the trespass issues and there's the national airspace issues but there's the privacy issues about the data that's collected there it hasn't gone as far as some people have worried it's going to go but there's still a patchwork of laws out there that you need to think about and that's going to continue to grow because the federal government hasn't stepped in but that's the first area where I'm starting to see geospatial traditional geospatial technologies being regulated from a privacy standpoint other examples of the evolving legal framework I mentioned the EU General Data Protection Regulation it mentions location information those data that's personal identifiable information that's all it says location without any definition there's been some clarifying words afterwards of examples but it's really not defined and us lawyers hate that because we don't have a gray line that we can walk up to so there are a lot of companies in the geospatial arena who are trying to figure out what that means and what data they have on European citizens and how to do it because there isn't clear guidance as if it was a website or if it was other personal identifiable information the US Federal Trade Commission which is the organization that has within the United States is primarily responsible for consumer privacy at the federal level has introduced has brought some cases against companies regarding geolocation information the stingray technology for those of you who know it's law enforcement using cell phone spoofing towers cell phone spoofing technology to pretending where a tower is so they can figure out where a phone is a lot of these those type of cases around law enforcement are actually being decided fairly soon by the Supreme Court in terms of whether you have a reasonable expectation of privacy when you're in a public space and what whether that fourth amendment protection refers to location in the past people always thought it has but now the courts are starting to re-look at that ironically because there's more technology there is the more people have this expectation of privacy associated with it which kind of flips the original case back from the 1970s part of the challenges is that the privacy laws and regulations are built on principles back in the 1970s when computers were first being commonly used the fair information practice principles and here's a list of of think principles that you need to apply but applying it to geospatial information is very hard it's much harder to define geolocation information than it is than it is like health records we give it out any day every day anyways right so we we go out and we we disclose our information in the public space to people we wouldn't give our bank card to our medical records to so when you start to try to define that and how you use it it gets really difficult I would argue that there are culture gender religious social components that are much more harder to clarify and differences between what people consider to be sensitive and how it's used it's collected in many more ways and the privacy challenges are much more varied right so we've got everything from people using your location information an insurance company using it to figure out how far you drive and whether you go to unhealthy places to eat to the crazy you know a stalker using location information to track someone to the government being able to use I mean it's much more varied than many of the other traditional privacy concerns Homeland national security issues like we discussed a lot of this technology originally has come out of the defense and intelligence communities they have a particular viewpoint and many times when they see new technologies they're worried about it what concerns they're going to have from a Homeland Security standpoint to forces overseas to be able to monitor deployments all those things that are within their their framework to worry about and that's fantastic and in many governments around the world those viewpoints have carry a great deal away even in the United States if you look at the commercial more sensing regulations right now we're still seeing efforts to streamline that process because people are concerned that the commercial industry is being hindered by national security concerns that may not be as necessary as they were even 10 or 15 years ago so part of what the geospatial community can do is help sort of think through what the risks are and how it should be balanced with the benefits and these concerns cut both ways right so this is the straw incident where people were collecting information Strava posted information from people who are running around exercising and posting on the web and some researcher went in was able to identify some facilities I believe it was in Africa where there was secret military bases and it got the military all up in arms and concerns about whether they should be able to use these type of fitness trackers and things like that the Pentagon for a while banned cell phones I think for employees I mean there was this there was this overreaction to this technology Strava went back and changed the terms of use and they started deleting certain types of information and not using it in certain ways so it sort of had this backlash about something that I would argue was a natural that most people within the geospatial community would say yeah that makes sense right but once it got into the public domain a lot of people got concerned about it from a national security standpoint but there's a flip side so for instance several years ago Facebook and Twitter cut off data access to a company called Geofedia which was using this information to highlight where people were tweeting about protests violent acts to try to figure out where where that was taking place right all the things that we talked about here they were using it in a law enforcement or intelligence applications and there was an outcry I think it was the ACLU pointed this out that was being used it was an outcry and so Twitter and Facebook cut off access to it kind of similar to what we're seeing with the Cambridge Analytica and within a month month and a half they cut half their staff because their business model was based on doing that right so that the national security concerns associated with how the how those agencies were using the data these companies wanted to stop doing it I think with a rubber meets the road on all these issues is in the licensing agreement and that's because as I said data is not given to someone anymore it's licensed so the person who's like the licensors retaining ownership and that organization has very is concerned about a lot of the issues that I've talked about particularly big organizations the they've either spent money in doing building it or they're worried about the risk associated with it and so a license agreement in this community often is a transfer of intellectual property rights what can I do with it but for a lawyer it's a legal agreement that allocates risk between the the organizations and those risks can vary right so what happens on a breach how long is the agreement for does the party have a responsibility to return the data after the the agreement is done and I think it's important to note because I I I frequently will get asked to develop all these terms and things but I want it one page I want it simple right because my people people won't if it's not if it's not in the document that doesn't mean that it doesn't you know those terms concerns aren't go away someone's going to make that decision for you whether it's going to be the applicable law whether it's going to be a court so making it shorter makes it easier to sign and move on to the next step but it doesn't address all the risks that that need to be considered so that's just a point and so I think I think that is from my standpoint as to why license agreements are very important and as I said earlier it's not just a data license we have a lot of different terms data is licensed or shared in a lot of different ways so with data sharing agreements terms of use terms of service when you go to a website and there's data there sometimes there'll be a license language in there I give you the right to use this or I give you the you don't have the right to use that as we look at data as a service increasingly as a business model people going to access data there's licenses in there even software as a service companies traditionally in the software as a service domain are starting to to starting to give data to or license data to customers that may not be their data they may license it from someone else which has a whole lot of the set of issues but that's becoming increasingly how people are accessing their data through the service and even your cloud storage agreement if you look at your cloud storage agreement they're granting them rights in the data now that right may be just to store it to have a backup copy to have someone access it to make sure that the service is being done but there is a license that being granted in there so part of the work that the center has done with the UNGGIM is we are developing a compendium on the licensing of geospatial information and this is geared not for lawyers but for particularly developing parts of the world so that if you're in an agency and you're trying to share data you can at least understand what a geospatial information license does what it's intended to do now it's not it's not a new license it's not a form license but it gives someone an ability to look at okay someone has given me a license what does this mean it's hopefully going to be able to articulate that and then also when they're being asked to develop licenses to open data their organizations collect to other parties that they will at least understand what their license agreement is again because there aren't many lawyers that are able to help them particularly in the developing parts of the world where are we going I mean I think these issues are going to get more complicated and I think that's because the technology is evolving big data is evolving we've got the internet of things autonomous vehicles smart cities all the wonderful things that have been discussed here but we're going to require a greater degree of accuracy accuracy decision making is being pushed further into an organization in a lot of cases is being done in a way that we're not even know what data is necessarily going in there or what the algorithm is making the decision so I think these issues become more are becoming more important particularly because I think the increase of errors and damages which is a legal term will increase as well the other dynamic is we're getting new stakeholders involved right all these new communities so we talked a little bit I think some of the folks talked about trust being able to share data with people that you trust these people aren't traditional geography or geospatial communities they're a lot of them are west coast we're hard and charged want to get things and do things and so they have a different mindset and you're going to need to sort of understand that mindset and they're going to need to understand yours before you're going to be feel comfortable in sharing data because they've got a different way of doing things they have their own motivations and concerns and frankly they're all going to be subject to their own regulations and legal framework as well to have an example of some of the new regulators that are developing in this area around geospatial information that could touch the work that the people are doing so consumer protection around privacy we talked about telecommunications law enforcement energy with smart grids transportation drones autonomous vehicles all these agencies are developing regulations in response to the collection and use of geospatial information it's all data that you want to have access to and you want to be able to use but it's going to make things more complicated to use it I think and the fairly you know short term because even though we are reactive there's a lot of pressure for instance to get drones out in the sky up in the sky there's a lot of pressure to get smart cities in place a lot and autonomous vehicles but there needs to be a regulatory and legal framework around that and those are going to deal with some of the issues that you're dealing with and that's it that's it that was a lot thank you we could take a question or two here if you need legal advice Andrew I have well it's simple to work but probably very hard to answer so I think everyone knows about HIPAA right for health care records right I mean do you see that for C something similar applying to like location based like any if it captures location no matter what it's actually regarding to would it fall under something like a location privacy act that becomes kind of a a nationwide or global standard so there have been there have been several efforts in the US and internationally it was something called a this isn't a treaty what was it a convention on geospatial information that the International Bar Association tried to tried to pass that would both of those would have done that that very thing basically would have made location data to be HIPAA like in your ability to use it and to share in neither instance was it really described that well right and that's that's the challenge and all these things because is it you know how close is it is it one foot one meter a zip code I hope we don't get there I worry that that we may because I think that it comes out of there's a reaction that takes place after certain things that develop and it's not within the geospatial community a lot privacy laws around location and not being based upon what the geospatial community are doing but more social media companies and other companies so I would say it's less than 50% but I I wouldn't rule it out and I would I wouldn't be surprised if in certain parts of the world Europe in particular where it gets pretty close yes I'm just I think this is a great talk and these are super important things and what I was wondering about was in terms of this sort of regulatory framework there was the senate bill one two five three which was the geospatial mapping act that came out in 2017 which would have restricted the provision of any mapped or geospatial data to the federal government only to licensed engineers and surveyors right and that would put a lot of us out of business basically because that would probably then filter down to state level provision and it could be any kind of map these are really genuine concerns do you think that this could end up in a regulatory framework which will really restrict our ability to utilize geospatial data to do things like scraping websites and all of this could you see a sort of art open data world and sharing basically contracting a lot it will contract some I'm 99.9% sure of that how much I'm not sure I think that the problem is that the likelihood increases because people see so if I'm being asked to share my information because there's a free you know dollar off on Starbucks right I'm not going to do that and I'm not going to worry about it if that's taken away from me but if I if that same information can't be used to predict when the flood is going to come or the environmental issues but the geospatial community has not done a very good job of articulating the role of the geospatial information and is not participating in those forms where these laws are being you know discussed or these policies are being put into place so because of that yes I do think there will be some restriction I think privacy could be a really significant area the intellectual property ones I don't know you know the scraping issue but so far the law has been that the courts have decided in favor of allowing scraping but all it takes is you know two justices on the ninth circuit to decide otherwise and then you've got a you know a whole new problem on your on your hand so part of what I've tried to do with the center with some some success is to get the broader geospatial community to see that you're no longer operating in a legal and regulatory vacuum it's it's coming and it's coming more quickly now I'd like to ask whether you see is the role of geographers or the geospatial community in taking and having a more active role in this I mean we see the big the really big enterprises being the kind of tail that wags the dog right now and I wonder what are the maybe the pathways by which academic geographers or geospatial practitioners can have a more of a voice and I apologize because I don't know all the all the different names but I know that there are associations that geographers is a AG and others that that are part of and so and a lot of them have you know policy aspects right so the the bill that you mentioned caused a great deal of uproar because but it was a very narrow uproar right I've tried to get similar concerns around some of the privacy issues and other things and and I think until now anyway it's been well that's industry's responsibility we can do what we want with the data you know there's no reasonable expectation of privacy and public this is open data so I would use the existing organizations and structures for at first but I also think to think of yourselves as as as part of this broader community and so when you see things that's going on with Facebook or when you see things that the government's being asked to do in terms of not being able to to use data in a certain way think about what impact it could have on you because we I think this community tends to think in silos and I think a lot of those barriers are being broken down for the reason I said about the way we're all data collecting and sharing data now in the process go ahead can you can you sketch or if there is a clear evolution can you sketch the path from the two things I thought I understood which is if you if you're a photographer you can take pictures of just about anything from public space and then there was a case in the early 2000s where a fellow decided to document the California coastline from his helicopter in order to monitor erosion and to be on the lookout for construction that went well beyond what was permitted and he got sued by Barbara Streisand who didn't like pictures of her backyard being taken by anybody and I believe he was upheld and I don't know whether anything that would constitute a balancing or an evaluation of the respective rights came out of that was that a was that a start one thing came out of it is something known as a Streisand effect which is when you complain about something then people tend to go see what her house looks like right so people started so it actually created the reverse and there's a similar situation in Pennsylvania when Google Earth came out ironically a family named the Boring's claimed that Google Earth was violating their privacy because it was the street view and they drove up into their yard and they actually trespassed and took a picture and posted it on the internet on Google Earth and the Boring sued violated our privacy and trespass and all these other things and the court threw out everything but the trespass right away and one of the reasons was because and the argument was you diminished the value of our house by having a picture of it on the internet and the one of the reasons the court threw it out is the local county had pictures of all the houses on their webpage and so but the Boring's didn't hadn't really realized that or thought of that so there was no diminishing a value and they had to Google had to pay a fine of a dollar for trespass right so I think personally I think it's drones are going to change all this I think the laws around drones are really causing lawmakers and we formed an association for unmanned systems in Virginia and we're fighting in the general assembly on some of these issues and the challenge is when you're on the ground you don't see a satellite flying overhead right you see a manned aircraft that's collecting some pretty cool pictures but you just assume that's a passenger aircraft when you see a drone fly overhead you don't know if that's your creepy neighbor Amazon delivering a package or you know the government's spying on you right and so there tends to be this visceral reaction particularly in certain parts of the of the country and those people are calling up their state legislatures and they're saying I don't want anyone taking a picture of me or my daughter it's always a daughter at a swimming pool so I don't want anyone taking a picture of my daughter at a swimming pool so you need to stop this and that's and that's that's happening that's slowly happening in Florida right now there's a law that says that you can't use drone aircraft to take an image of someone on their personal property if they have taken steps to not be seen from the road so if they put up a fence now there are exceptions to that and mapping happens happens to be one and other happens to be one as well but that's that's on the books there's they've created a reasonable expectation of privacy and being in your backyard out in out in the common view and it's so it's being whittled away but it it is it is happening in Virginia this was before we formed the association but you couldn't you can't use it law enforcement can't use a drone to go out and collect images for regulatory purposes without getting a warrant essentially they've created a reasonable expectation of privacy to not have a drone collect a picture of you they can go get a satellite a manned aircraft but not a not a drone so it's evolving and drones are really in my mind sort of changing that well David you have oh we have one more questions see if you can get yourself to the microphone here to this mark I was just wondering talking about evolving legal frameworks if you had to take on license plate readers especially in relationship to the recent Virginia Supreme Court ruling on that issue that's a real for me that's a really hard one I I tend to believe that it's not you you you shouldn't have a reason what takes your privacy that your license plate information isn't being collected and used I've actually was at a talk that then Governor McCallup said you know you don't own that license plate that's state licensing that you know giving you a right to use that that's not yours so you should have reasonable expectation of privacy in that so I tend to favor that side but I certainly understand the concerns and I and frankly because for some of the reasons we discussed here without a better way to deal with it in thinking about it it just makes it harder right it's sort of black or white and that's that's the challenge we don't have anything in between right now because this is all also new I think we'll get there but I think we're in that phase where we're still sort of figuring all out that that help yeah yeah thank you isn't there a danger of using the licensing argument for example most software is licensed even stuff that you buy or download and if it does something you didn't want it to do or is the creator of the software protected by the fact that it's a licensed object it's a little bit of a different license unfortunately has a couple of different contexts in the legal standpoint and so the license in referring to the license plate reader issue is simply the state is giving you a right to use that license plate right and so what it's saying is you shouldn't have a reasonable expectation of privacy in that the license agreement that you have with a software provider I've drafted these I'm sure there's also a language in there that says that they're not responsible at all for anything that goes wrong including if the computer blows up on your desk right now whether that's enforceable you know whether you you would have a claim or not I mean there's a lot of issues associated with it but there's all that language that's in that's in license agreements that sort of disclaims also to representations and warranties and I'll point this out I just we've talked about Facebook and Cambridge analytical quite a bit I thought it was ironic that when testifying before the the House of Commons I guess the people at Facebook admitted that they didn't read the Cambridge Analytica terms of use in terms of how they were going to use the data that they were collecting terms of use terms of service they seem so innocuous I have people refer to them as boilerplate and why do we need to look at that and their their contracts their legal agreements and that's what your right is in a lot of instances unless you're a as a consumer you get additional protections you can't waive certain rights usually but in a business environment that's that's what you're signing up for and they tend not to be paid as much attention as I think they probably should particularly in this context okay thank you I'd like to thank Kevin Pompritz and and it's time for us to change gears so now I'd like to invite our other speakers to come sit at the head table and I'm going to ask Grant McKenzie to lead us off in a discussion and let me just say a few words about Dr. McKenzie because we haven't introduced him yet he's an assistant professor in the Department of Geographical Sciences at the University of Maryland where he leads the place time analysis laboratory and he's also affiliated with the Center for Geospatial Information Science he works with big user contributed data sets as well as authoritative data sets to pull out spatial temporal and thematic patterns and he too has a geographically interesting background he holds a PhD in geography from the University of California at Santa Barbara a master of applied science degree from the University of Melbourne and a bachelor's degree in geography from the University of British Columbia so we're going to give him the podium for just a couple minutes and then engage in a in a complete discussion here I would like to mention before we begin that I'd like to remind you that we are we are taping this session and so anything you say might might take on a life and also that that at this point the people who are listening in online if they'd like to can type in questions in the chat box on your screen we may not be able to answer all those but we'll at least receive them and do what we can so Grant Well great thank you so I think it's fault on me to sort of try and summarize this put this in some sort of context as well be very brief I have about five minutes here just to go through some of this some ideas and then turn it over to more of a discussion that we'll leave here there we go so just in summary it's been quite interesting to me one of the things we always talk about is geographical sciences or geography is really being this umbrella discipline I think we've all heard this before and it became very clear to me today and the number of topics that were brought up and the range of topics that were brought up but this is really true right we have this umbrella of geographical sciences where we all hold spatial science or regional sciences are sort of common thread and we vary the topics that we talk about and so I think what's fascinating about this is that we're now because we're in this breadth of disciplines or breadth of domains we're increasingly expected to understand various aspects of these domains much broader aspect of the different components that we're looking at and this could be looking at technology and technology trends and how this then relates back to probably what we're more interested in which is the specific theme or topic that is fascinating to us and this could be things related to health or ethics or privacy or urban planning and so we need to have a better understanding of what all these different components are I think that's something that's a little tricky for us to do especially as we're now starting to realize that these sensors social sensing citizen science whatever you want to call the latest term that we refer to this as and bringing this together is something that we can make sense of I just wanted to focus on a couple of these we've talked about a couple different examples of these but I think there's some issues and some opportunities that we're facing one of the issues that I always talk about and the talks that I give is this example that came out of the London Wi-Fi bin sniffing you may have heard this five or six years ago where garbage bins had basically Wi-Fi sensors as you walked around the city your MAC address of your phone was being tracked and they could determine where you were going what you were doing and so this became a major issue and we talked about sensors and they able to track people and understand where they're going but you can also at some level in the gray area of all this see the relevance of having some kind of technology that does this right there's certain things you could track certain people that were doing things that maybe weren't legal for example so there's there's issues that go along with it and some of the bigger problems that we face with it as well but then there's also these bigger opportunities that we're faced with now graphic scientists and I often say geographers are bridge builders right we're not just the people that study the space itself but we're the people that bring people together in space being our common domain that we focus on so we have to be able to speak a language of computational sciences the humanities social sciences looking at a range of environmental science and what we have so it's fascinating to me that there are very few other disciplines that are faced with this ability to be able to speak these different languages when we're starting to be able to do that some of the examples we talked about I spent a lot of time in in Christchurch University of Canterbury you may remember the series of earthquakes or ongoing earthquakes that are still happening there some of the opportunities we're facing when we think about sensors are when you have devastation like this you also are given an opportunity to roll out things like new sensors right sensors into environment to be that through cell phone coverage through water main coverage traffic cameras all the things that go along with that so with some of the issues and problems the major just catastrophes that we're facing we're also presented with these opportunities to understand how a city operates in the smart city environment or the settings we see and I think we as geographical scientists are well situated to discuss a lot of these problems as well and then the sort of issue or opportunity depending on how you want to look at it of our attachment to mobile devices right or obsession or addiction as that's been defined of mobile technology so we're all carrying around as we've heard before cell phones that have an incredible amount of sensors available to them I do a lot of work looking at detecting place types coffee shops cafeterias these sorts of things purely based on noise sensors and altimeters and trying to determine what that can be used for one of the things I always think is really a fascinating problem that we're looking at right now is that the cell phone really is the frontier of a new battle that's happening right a battle for our attention so a lot of companies are vying for our attention through the use of these technologies and this has moved itself into the geo arena right geo fencing has had a large role to play in this as I walk past grocery store on the way home it notifies me need to pick up milk right so we have benefits of that but maybe as I walk past the vending machine I don't need to know that coca-cola is being advertised based on where my location is so location is increasingly being targeted to me with my wearable devices are my my cell phone so we we have this new sort of frontier that we're having to wrap our heads around technologically I make sense that from a sensor perspective as well I want to leave it a couple of this major questions here that I think are important in looking at this what do we as bridge building geographical scientists need to do to further facilitate this discussion do we need to look outside our fields we need to look inside our field who do we need to speak with or bring together to further focus a lot of these questions and answers one of the their questions I think is increasingly relevant to us as well is what existing theories can be tested with these new sources of data but also what new theories can be designed right so a lot of the technology a lot of the social media data a lot of the things we see out there are doing a great job of helping us answer some of the existing questions we had sort of referencing the theories that we originally had but where are we finding new theories what's what's being produced from this new data we're finding right other than confirming what we've seen previously what are the next steps were the opportunities for researchers in this area merging industry government agencies and researchers to do this kind of research and then lastly what are the limitations facing this broad umbrella discipline what can we do to bring this together to bring people together and what are some of the hurdles that we need to overcome to get to the next step so with that thank you very much I'll leave it there I'd like to open us up to discussion which could be asking a question of a specific person at the head table or it could be asking a general question and we'll let them respond I might kick it off by asking about the sensors themselves and asking in general what's coming next we have cell phones we have fit bits we have traffic cameras if we look into the future I don't know it's hard to say because it's happening so fast but what might we be anticipating that's different than the next five or 10 or 15 years I think that data derivatives is also of image recognition so we talked about driver's license but I think videos are going to be on everything and then derivatives of that data then you've mentioned a couple in your talk which are data from autonomous vehicles it's going to be significant the sensors and cars produce tons of data about the urban environment and that what do we do with that data it's going to be important I mean it's partly there as I pointed out but we're going to see a lot more biophysical monitoring going on and that's being used for things like AG in place so I was just on a PhD committee where the engineers are trying to work on algorithms from the cell phone so you can figure out if somebody's gate has gone off if they're about to have a stroke I was talking to Nancy at lunch there's facial recognition that's being done on highways because they're trying to figure out what happens right before someone has a traffic accident or they text him or they distract him there's a lot going on and mood recognition both facial and voice that's really developing very rapidly so that's being used to predict potential onset to depression or suicidal behavior and then merging all this together what we didn't really focus too much on that today but the remote sensing field and the array of satellites that are going up now and how that is being used with traditional ground data with some of the new sensing technologies that's going to really come on stream and what we're seeing is sub meter resolution with much higher temporal resolution and then the products that are being derived out of some of the specialized satellites they're going to give us capacity to predict but we're fairly small areas things like air pollution and greenness that are important environmental exposures but at much higher resolution of what we've seen in the past with much more detail I think looking toward the cell phones I show that there was quite an array of different sensors they're rapidly moving to the point where they're all going to have to have the capacity to embed all those sensors so there's going to be a lot less heterogeneity between the phones and all likelihood and the quality is going to go up and that will drive development of apps that use many more of those sensors in combination and then context specific recognition is important so the sensors are able to understand the context they're in and that's going to I think lead to we're going to see a merging of this with potential virtual reality glasses and other devices so you might be walking by and something on your sunglasses is going to say hey there's a sale at this store maybe you're interested or maybe you've just done a Google search on your phone looking for a restaurant and you're going to pass by one and it's going to flash the menu so that kind of connectedness is definitely evolving very quickly so this is Marilyn Brown speaking we haven't talked very much about the data that can be obtained from equipment that's now installed in many of our homes the information from the thermostats and our computers IP identified devices that will tell people what we're using when whether we have expensive medical devices in the house whether we're home or not sort of reminds me of when I used to live in a suburb and go out as a kid and my mom would always say before you go out during that light on and then no one will know that you're not there right so we're going to have to figure out how to override all of you know receive the deceptors or something to figure out how to not clearly convey what it is we're doing in every moment when we're home or in our business all of that data too we've not really talked much about that I might bring up two more points in terms of actually I think these more sort of data practices than maybe the actual technology one I think the technology of blockchain is actually really interesting there's an interesting use case for these sort of shared data I mean going beyond the whole sort of Bitcoin cryptocurrency blockchain as long as we're not using a proof of work kind of concept but a more sort of shared kind of easier less computational intensive concept is actually a pretty good structure for sharing this data among you know a lot of distributed users distributed applications so this is more speculative I'm not sure if this is going to come to pass but I just want to pass that out as a possibility and then also building on your point I think I think this really is this other data practice of resistance that you know this is not this is not a set of technologies it's going to come and be imposed on us though it might feel like that technology comes out it's a social process and part of that social process are people taking whatever efforts to resist the technology hide from the technology confuse the technology in various ways and I think that's going to be a really interesting step moving forward as well and actually big implications for actually using it as a research research tool as well Can I just quickly add to that sorry I think sort of the continued on from this direction we're seeing not just in geography or geographical sciences is the fake side of it right so location spoofing is obviously a big deal a part of this as well and you talk about the please rod me dot com or whatever the website was where you can determine when people were home or not but we've also seen this with with the amount of data we're attributing in terms of voice data video data location information the ability to then turn around and fake somebody right to be able to mock to mimic their voice we've seen the videos of Barack Obama saying things that he never actually said where is that going from a location perspective I think is going to be quite interesting too having shown where people are but we absolutely know that they're not at that location are they able to spoof where someone's been where they've gone all their devices that they were in the house but we know they weren't in the house so where is that acknowledgement of where someone's going where they've been and what we're actually able to prove with the data I think it's going to be an extra question as well one point about that and sort of the flip side of that is there's a concept in the GDPR and in Europe the right to be forgotten and you can ask that information about you be removed in a location context that gets really interesting right so that if you have the ability to have someone your cell phone provider or someone say you weren't there even though you were I mean you're changing the truth right and but that's that's that's out there and you know so you could see a situation for instance if there was a bank robbery or something but the only person who was smart enough to call and have them their location changed was the person who did the rob the bank right so there's there are these concepts that are developing not around location that become really interesting when you try to think about them from a location standpoint thanks we'll wide open now for all kinds of questions that so this is something that I've been toying with for quite some time in starting when I read the sensible city lab Dr. Williams might know about it obviously they did the partnership with Yelp right and they scraped the the ratings on Yelp and Boston and developed an algorithm and they passed on to the Department of Health Sanitation in Boston and they were able to do predictive check-ins on health code violations of restaurants right good use case the Yelp users typically are technically self-disclosed their preference to this situation right they said no one-star bad deal gave a sentiment analysis on the text of the review something that's been bugging me and did my graduate research in social media and it's probably why I don't use social media applications anymore is this concept of utilizing active and passive digital footprint with the disclosure if you are self-disclosing or if you are just being disclosed at the same time and how where the ethical consequences of such things and when you can do the ask of the user instead of shifting the onus of the responsibility of the data control onto the licensing maybe between like Facebook and Cambridge and Atlitica you can ask the user how they want their data to be disclosed and how they want it to be utilized so it's a topic I've been like mulling around in my head right I used to use Facebook but I never gave Facebook permission to Facebook tracking pixels right and watch me peruse the internet and what my shopping patterns were so it's kind of it's kind of muddle and weird but to I think Kevin's point it's people have locked it inside the bucket of saying it's hard but maybe we can take it out of the bucket and look at it and examine potentially why it's hard and what to do with that geospatially if you'd like but that's all so I I think there's a it is very hard to give users under the current wave privacy policies in terms of user developed it's hard to both describe all the different ways that you might use the information and put it in a phrase that in the terms that a reader can understand who doesn't have the background and technology that the folks here do and make it short because that's how privacy folks complain about I mean people complain that the Apple Apple terms privacy is you know 42 pages or whatever it is so there's this real tension in terms of how you go about doing that and give users the control and right to do that now the GDPR is trying to do that it'll be interesting to see how that develops because I think that's probably a big step forward in that area but you know it is hard and part of it is I think and the question was asked how what can this community do I think part of what this community can do is maybe break the privacy concerns around geolocation into several buckets and then prioritize those buckets which ones are the real privacy concerns and which ones are just ones that make you feel uncomfortable because some of them we're just going to have to get used to I mean the world has changed but some of them are real are real concerns and I think the geospatial community is particularly well suited in maybe interacting with others to help figure out what those real concerns are and then ways maybe to mitigate them but still allow the data to be used in important ways so I have a question I think Grant you kind of voiced a little at the end but you know we've talked about big data and social data and streaming data and sensor embedded data and a lot of the techniques you all talked about were probably traditional analysis techniques right or in some way I'm curious where do you see like what are the open research questions for new methodologies and new theories that might need to be developed given potentially new sources or types of data or also as I asked or about you're actually asking this the the sensor questions and it's responding to you you couldn't use to do that for trees or land slope but you can the person so curious what are your thoughts on new research opportunities for people now that these data are available or could be obtained that's a tough question I think from my perspective one of the things that I find most fascinating is this and we've been having this discussion for years well before I started my graduate degree on place and space based research right so the what what places and what spaces and what the the multi-dimensionality of places and how do we represent this concept of abstract concept of place in the kind of research that we do and I think we're starting to get to that point where historically place has been you know space and still the meeting by the people that go there or visit the place but we're starting to be able to say well we can actually capture a lot of that meaning right the contextual information surrounding the place the time of day you go to a place the what you eat the photos you post what the temperature is in that place starts to encompass this idea of what a place actually is and so we can start to ask their broader questions in terms of what are the similarities between the different places we see right and so there's I don't know exactly what the theoretical questions we're asking there are but we're starting to be able to approach these concepts that have historically asked as more theoretical questions from a more empirical standpoint and I think that that's going to be a big push continuing forward as we move sort of place-based GIS systems if you want to call them that too just I mean I've been thinking about this since you asked that when I in my after my talk and and I I think the the thing that seems like we might be able to do now I'm not quite sure about the how the logistics of it but is engaged in something sort of akin to sort of a clinical trial sort of do a B testing ask ask people in the same sort of area for thinking about how people are are proceeding a particular location ask people you know a different set of questions or do or impose some sort of different kind of stimuli when they're based in a location and run sort of a real-world world kind of experiment now that sounds really odd to say because that's very that's not how we've done this sort of stuff in the past I think there's lots of sort of issues and lots of things but in terms of what might be a novel sort of methodology that might answer new kinds of questions and build new theories I think that would be one way forward yeah I kind of took a very technical perspective on your question and I think you know I started out in my career doing remote sensing when we had 30 meter pixels and like you know we spent a lot of time trying to identify the noise in the pixels and like figure out this kind of mixed pixel what does it mean is it an urban place or is it a deciduous forest right now we know what every pixel is and that's overwhelming as we heard earlier from your talk right and so I think a lot of the research and technology are like on this very technological level is about how to re-aggregate the data so how to group it into objects how to create structures in which we can derive new information because like when we know every pixel like it's it doesn't make an object anymore right like it's it's you know part of an object so how do we build it back into different kinds of object points whether it's the GPS data or its image recognition data and so that's why I think a lot of the research that's going to be coming out is really about this like re-aggregation of data which is why aggregation of data as a study is also very important for the kinds of things that I was talking about with data standards right so like as we what are the standards for certain kinds of objects and this is already happening with the autonomous vehicles right like we you know they're sensing an object that looks like a human and they're sensing an object that looks like a bike right and then these are going to have data standards that go with it so I think this is a huge area of research that you know one of the reasons I think about is with your data set like with with the stuff that we did in digital metaches when we made the app on the onset we made it so that it snapped into objects so that like as they were collecting it I didn't have to deal with this huge mess of points and that rather I was dealing with objects from the onset so that would be one yeah so I just to round it out you know I think that getting being able to sample out of the data in a way that is going to reveal the meaningful patterns that we're interested in whether it's human activity behavior or exposures or ludes without having to deal with all of the data all of the time so I think that's a really big question you know people have especially time activity very regular patterns for the most part they may deviate when they go on holidays but you know most people are getting up going to work do we need to sample of every single day for every 10 seconds to know that probably not right so how do we start to extract that useful information out and and then I think an ancillary question is how do you determine the quality and what's acceptable and whether whether you can get to that point through sort of digesting it with a Bayesian algorithm so that you're continuously probing the data to see what's credible and what's not I don't think we're there yet but I think that's ultimately the goal and then you know taking into account the multiple sensor data sources that are available and I saw a really interesting talk about oh you know trying to determine the you know nighttime populations nighttime daytime populations all over the globe by somebody at Oak Ridge National Laboratory and you know they're they're using everything from menus off Yelp to transportation data so it's this massive amount of a simulation that's going on from numerous different sensors and information sources so that they can come up with some reasonable estimate and that's an important thing to be able to estimate because if there's a disaster or something like that you want to know how many people are likely to be on site where are they at certain hours of the day also as huge implications for transportation planning for environmental exposures you name it so being able to take that multiple array of huge sensor data and another data that's coming out of the the web and bring it all together in a meaningful way is another big step I think that we have to take when we start looking at the sensor data so and going back to this what can we get in terms of theory and no understandings of human behavior some of what I think we might be able to address the behavioral economists have been moving toward and to understand why it is that actors don't behave rationally and a lot of the explanation for the shortfall and rational behavior is because they don't have information and in fact there's a new term that's been created called rational inattention it's rational not to bother to try to find out the exact costs and benefits because it's so expensive to do so and so people buy what's convenient and they copy their neighbors and they make suboptimal choices and this has been really important in my field which has been about how people make choices for energy consuming equipment because keep buying the same old crap even though a lot of improvements have been made and they're not aware of them and so we could now imagine a market in which you could make a major shift toward greater market efficiency and see it's how much of the gap and irrationality can be explained by providing information because it can now be so it's not so cheap to do so so one of the areas that we're looking at it in Atlanta is there's a very significant energy burden that's being paid by low income households whose bill can be 10 or 15 percent of their household income their energy bills and they may get you know monthly balance billing where they get the same bill every month and they have no idea what it's how they've used it or they now they can get billing like you do with your cell phone where two weeks in it says you've got you know 10 days left or you'll be over your last year's use in the month of May so now there's real time feedback and so lots I think there's a good potential research agenda to determine to what extent people will really bother with this new information how much do they will they how much attention will they pay someone or maybe a couple of you mentioned the attention economy I had actually not heard that term before who said who talked yeah yeah I think that that's a really fascinating area to explore the attention economy how much you will you be willing to pay for more information how much of your attention will you direct to acquire information even though it may be very easy but you still have to pay attention because it may be a bit complicated I forgot to throw that out that was not a very good question what did you have in mind Matthew when you mentioned the attention economy so nice not to be put on the spot no I know I think it's actually a great question because I mean in some ways the building on your or actually both your points but this aggregation I mean we have so much data thinking about how do we aggregate it up either categorically or spatially I think this is also you know a really good you know question but also this thing from the individual user perspective or so the consumer's perspective you're talking about that you know will that you know it will that the new information actually be put to use or you know does it have to be you know be aggregated in some way that's that's easier to actually to consume it's sort of almost almost like the terms of service you can get all the detail but if you're actually not trained to use it you're just going to ignore it and maybe just sort of you know sort of tying this back to some of the other things we've been talking about that you know the kind of attention to which you know to we have as we move spatially what kind of what kinds of things are we paying attention to not that we really should be trying to optimize as spatial advertising or anything like that but you know what are the kinds of the things that we noticed as we move through through a particular urban space or you know natural space and maybe tie in some of these kind of questions as well but I think we'll just end there there was one less there was a review of the book that I read which talks about how much we're how much information is being thrown at us to the extent that our creativity is being compromised because there's a link between creativity and having time when you have nothing to do but think So I'm Lee Schwartz from the Department of State and thanks very much for all the presentations and I want to direct this question towards understanding communities a little bit when we try to have sort of a decade long effort to mainstream human geography and the defense and intelligence community a lot of that was because of the failures to to understand communities and in places like Afghanistan and Iraq and we're talking a lot about understanding space and understanding location and I heard sentiment analysis mentioned a couple of times and in the world I operate in we're in data poor environments and I'm wondering in your fields of research if you could talk a little bit about you know using sensors to understand more how people feel tied to a place and their community values whether it's proxies for religion or the identification of certain of certain structures so getting behind just the way people are located to understanding how communities are organized which is really a major part of of human geography and from the academic perspective to talk about behavioral geography and the likes so maybe a few thoughts on that moving away from from space and more towards communities as a way to think about human geography thanks and not anthropology as the urban planner I'll attempt to take this question I mean I think in some ways the work that you saw in the ghost cities was attempting to understand what makes a vibrant community and that we need places to eat healthcare was one of the amenities that we looked at bang schools right and that having like a certain access to these different levels of basic services is what makes a thriving space and I think what's interesting about what you I mean you know one of the things that we've thought about during that project was like well what's important to somebody in China versus like what's important to us right so that was part of the asking questions of the urban planner so like one of our amenities is KTV which is basically a karaoke bar um and like malls was another one of the amenities that like malls malls are very important just like in Chinese culture like because well they're also important to our culture but also schools were a big issue for them they'll drive across the city to make sure that their kids they're going to proper schools and really like one of the number one reasons um that some of these areas were ghost cities because they didn't have access to proper schools but um what I think is important about what you say is well how do you do that in a data poor environment right like in that how when you don't have information can you start to set up these different parameters and this is something I think a lot about in my work that I've been doing in Africa and I would actually argue that the data is there as I said but it's about accessing it in ways to understand these books say like a different amenities a set of amenities or yeah I mean we don't have their data poor because we don't have the data this is because governments are not doing the surveys that they do in other areas right or we don't have Yelp which is you know like I use Yelp in this case in China the Chinese version of Yelp and um but there are other kinds of data sets that do do exist but then I would also further argue that missing data tells us a lot so where data isn't is very informative about the types of access to services that a certain kind of community has in the way that they operate right if they don't have access to the internet then they're communicating in much different ways than they are they did have access to the internet or they're using other kinds of social networks and so knowing that there isn't access to those things this is important to understanding community structure as well yeah I guess you know the mean able to understand the spatial dimensions of where people are at different times is incredibly important because that's going to tell you a lot about their activities what they're doing individually or as groups so that you know and most of the places in the world do have a lot of cell phone penetration may not be smart phones but you're going to get a lot of information from that I think that the the capacity to actually survey people while they're in the field while they're in different environments through these ecological momentary assessments it's another powerful tool for trying to understand where they feel connected what they feel like what their mood affect is while they're in different environments that's going to tell you a lot about the community structure and then you know the cell phones all have capacity to determine proximity to each other so we've used that in studies of parents and children trying to figure out what their joint activity levels are when they go to parks or something like that so you can start to see how these social networks are forming in terms of regular contacts through the proximity of people on their cell phones so those are just some you know initial thoughts and then you know combining that with very high resolution remote sensing imagery is going to give you a lot of information on these points of aggregation points where they are coming together where they're frequenting regularly being able to classify that with with such high precision now is I think another major advance so it's another case where we're going to merge the the sensor data with some other observed data yeah I'll also add to that I think there's some really interesting opportunities to look at the variants we start to see within our own communities and extrapolate that to variants outside of our community so we did some work early on looking at we call semantic signatures or temple patterns to see different place types like coffee shops bars police stations the times of day the habitual behavior people have towards these different place types right through hours of the day days of the week seasons and then you start to to study those you know Los Angeles to Chicago or Chicago to London or to Sydney or even some some other other other parts of the world where you say what's the regional invariance that you start to find these temple patterns right so Chicago and Los Angeles and New York tend to be similar in a lot of different ways but there's some things like theme parks where they tend to be very different in their temporal patterns of how people actually visit different theme parks or or these sorts of things what we found is like about half of the different place types if you're looking about 700 different categories of places gastropubs you name it parks what we have this this millions of check-in behaviors or what you see on Google that popularity time you can see but half of them are regionally invariant right so people go to drug stores in North America always the same time right there's no variance you find between Los Angeles and New York but the times of days you go for dinner very considerably between Los Angeles and New York so it's a very small difference you see within the United States for example but as you start to have better contextual information about people in different cultures and different cities you can extrapolate that to be different patterns in different parts of the world as well and that's just sort of looking at time as one component but there's all these additional emotional components you know spatial components obviously the thematic components you get from the natural language or the language that's used to actually build these signature models that can be used as proxies in different places I think as well thanks great presentations and this is not my area of expertise so it's really been really particularly interesting one of the things that I was sort of struck with you know a lot of my work involves cities a lot of it is sort of climate change related but you know more broadly defined is this issue of the process of urbanization and you know if you look sort of into the future we know that most of population growth over the next several decades are going to be in small and medium-sized cities in low and income countries and I was particularly struck by your work Sarah and others about this issue of sort of bringing the informality into the formal or making things known and obviously in some ways that they're kind of a very the the whole issue of data and the capacity to organize that sort of you know is very transformative in that in that sense and I'm just wondering just a quick comment and then maybe a larger one but what were the implications from you know and I think you highlighted that a little bit folks sort of at the board but you know can you collectively speculate as to what is going to be the implication of transferring you know the informal to the formal or you know that that that boundary you know even Yelp you know estimate I mean we have departments of health giving A's and B's and C's on on you know and all this sort of health inspections and then Yelp is sort of a parallel so I guess that's a broad question the the the connection between the informal the formal and the informal and informality as a as a process in urbanization but then sort of kind of squeezing out and going back a little bit to geography you know I was always struck you know geographers have spent you know different chunks of the past century trying to understand what is urbanization and what are what are cities as well as urban planners and I guess to push you guys a little bit can you speculate you know as the full flowering of this approach and the data and the capacity to manage it emerges how do you think our understanding of cities might change what new questions might come up and what what might be the geographers analytical understanding of cities be 20 years from now or 10 years from now or what what do you see that's emerging so there's two parts the formal informal and then sort of the understanding of the city I'll try to take on the formal informal I think obviously whenever you visualize an informal system like we did in Nairobi or you wonder could you have adverse effects right like could the police come and now shut down the system because it's the visual eyes or does it change you know like how people have basic access to these kinds of systems with the government is not providing right these are market based systems the government is not providing public transport these are providing an essential service every day like what is that effect of that I think something we definitely thought about and that's part of the reason that we included the government in the process from the beginning I definitely think there's certain areas that let's say like visualizing this type of informality could be problematic I think in the case of transport there's a number of benefits one is the Matatu system does run as one organized system they have ahead of the system his name is Kima Tai they have an organization they plan routes together as a private based system that is formalized and by actually visualizing it in this way we allow we are creating a way for the government to have a conversation with them previous to that point the conversation about the Matatu with the government is it's too many people to talk to there's hundreds of operators and hundreds of drivers and hundreds of representatives and like we can't manage this system we can't regulate this system we can't like work with them to do safety things so so one one thing that's happened from this is it's been seen as one system which very much needs safety procedures I mean they're very dangerous they're like number one cause of death in Nairobi I like lack of like problems with this system so that's one thing is that to be able to have that conversation with the Matatu is about how they provide safety and other things has now been established because and and I'm not just with the government I would say more with the NGO community who did not see it as one system that they could fund or regulate or think about or have safety procedures for it's probably more the NGOs that we're talking to here in that case the other thing that I would say in the the case of many developing cities that have this kind of informal transit is that the big push right now is for BRT to come in and to formalize the system and by visualizing that there is an existing system that needs to be worked with is really important because you can't replace a BRT will be hopeful in certain areas but you still need the feeder system and they need to work together so by visualizing the system I hope that as these BRT systems and every city that we've worked in like there's a discussion about having this more formal system and then getting rid of the other one and what we're trying to do is have a conversation that this system exists and they have to work together well you know I think one of the the big challenges when you think about understanding the city is that oftentimes when we observe a relationship between place and say health or other attributes like physical activity you always have to be concerned about what the economist would call t-boo sorting or self-selection bias so this was revealed even you know 35 years ago by Michael Deere where he did all that groundbreaking research on schizophrenia and the city and you know what was happening there wasn't the city was making people schizophrenic because that's where all the mental health facilities were so the people that needed the treatment were sorting themselves so they were close to the facilities the same could be said today if you look at solute and eugenic exposures like green space that people who have the income can afford to buy into those places they may already have a healthy environment so there's a big movement of foot and public health and urban planning to look at so-called quasi-experimental designs natural experiments and the big problem with those experiments is that we don't have a lot of baseline or ongoing information about people's activity levels about before and after you install whether it's a bus rapid transit line or a bikeway or a park or any type of new facility that could be health promoting and you could say the same about detrimental health conditions where poor people get sorted into neighborhoods with low rents because there's too much pollution so I think when we look at the embeddedness of the sensors and what that's going to do for really increasing our ability to understand human movement and the environmental conditions that are there before and after we're probably going to have a much better grasp about how to conduct these quasi-experimental evaluations of the natural experiments that are happening all the time and that's going to increase our confidence and our knowledge base on what really works in terms of urban planning for achieving a healthier more active and vibrant population and economy when we get that kind of information so I think that's going to be a huge benefit and then our ability to take action on those results is going to increase immensely too because we're going to see that the micro scale of the data that we have is so much more refined that when we do want to intervene and we do want to you know take measures to either protect pedestrian safety or reduce air pollution exposures or reduce heat due to climate change increases that we're going to have that information on a scale that we're going to really be able to pinpoint and target the areas where it's going to be most effective when we do intervene my experience in the informal to the formal is that there are often stakeholders stakeholders who benefit from the informal and so when they when they figure out what's going on sometimes they push back on that and because they're not sure they're going to have the the benefits continue to have those benefits right so that's that's one aspect of it the other is and I think Sarah you know mentioned it there are benefits when it once becomes formal it's easier for government agencies to regulate or impose taxes or to you know develop legislation whatever's needed and in a lot of cases that's good but it also tends to increase costs and you know does away with some of the the reasons that it wasn't formal and the benefits associated with that right so you know there is a good and a bad and and maybe the good outweighs the bad but when you think about you know doing that there there have seen that and I've seen that in lots of different industries different technologies that are out collecting data and trying to make it more formal actually the follow-up with this as well in terms of how understanding cities we talked a little bit that's over the break I think this actually gets a bit to the way of identifying communities I think more and more it's when people talked about sort of relational networks and then sociologists have been talking about the network of community rather than geographers who tend to think of as a more continuous space now we have both we have the data to sort of to see this and really a sort of understand a city as a relation rather than a space in their spaces nodes are really important but there are connections of the relationship between the relations between these nodes I think this is some of the stuff we're trying to do we're trying to study gentrification through sort of looking at a social network analysis not not between people but between locations in the city and if people who only went from here to here start going to another location that's an indication that there's some new thing there maybe it's a new shopping mall maybe it's a some sort of gentrification process going on but you can use that to sort of understand how a city might be changing and that's actually I mean again we're still trying to figure out this works but since it's more real-time data than we see at the census we can sort of detect those kind of changes and again for for the community detection some of this can be done quite quote gun down with fairly simple data in terms of call data records I'm thinking some of the work that the people I work with in Estonia do looking at differences between Russian speakers and Estonian speakers in that context they're very different communities they actually tend sometimes they're like living right in the same spaces but we look at where they're where they're going in the evenings or the weekends you can see that they're very different communities they're physically right next to each other but we start looking at some of their leisure time they're moving in very different ways sorry about that this is very very enlightening I want to take a trip to the dark side now though and Sarah in your one of your slides it said you know big data cannot be an agent of change unless it's applied for the public good well that's only half of that there's only one side of the coin it cannot be an agent of change unless it's applied to the detriment of the public which it could be right and I'm I'm thinking about London I'm thinking about frogs right now that may seem strange to you but you know they say if you put a frog in the water and slowly turn up the heat it will allow you to boil it the United Kingdom now has about six million CCTVs for a population of about 65 million it's about one closer could television monitor for every 10 people and originally these were put in of course to start for a crime right I think if you went back to the UK in the early 2000s the late 1990s and said we're going to put one TV monitor for every 10 of you and we're going to monitor that there might have been an uproar right but it's that's become accepted now with Facebook yeah there was 68% people didn't trust them but the numbers of people actually dropped Facebook after the Cambridge analytical thing it wasn't as though they lost 68% of their user base and so in other words we could over time put into place these technologies these sensors and things like that they wouldn't necessarily maybe in every country maybe even in this country be used for the public good so what I'm going to ask all you guys is what is the dark side what is the darkest thing you could see happening in your particular area with the widespread use of this type of sensing and and you know what are the things that we would need to guard against it and I can think in each of these areas of things that could go really off the rails but I throw it out the committee here yeah I mean I think when we see what's what's happening in China and I'm not sure what it's called but it's basically like a social capital index Sesame yeah so basically they're installing even more video cameras I mean they have a lot more people in England but the rate of installation is is astounding and I'm only going on media reports but you're talking about thousands of new cameras appearing every week and they're actually trying to come up with an index that's going to track you when you you know J-Walk you have socially unacceptable behavior in public that you know I think is a very scary prospect because having those types of everyday behaviors tracked in a way that's going to allow the state to penalize you whether you're applying for college or for loans or maybe even have you considered for mental health evaluations those are that's that's such a gross incursion on our rights as individuals that I think that's a very dark prospect and you know I mean it's Orwellian in a sense you know I think in England they have been able to use in some instances to to show that it's helped them during terrorist attacks it's helped them to track when they put in new bicycle lanes so they're having beneficial uses to that but I I think this probably goes back to a lot of the legal protections that are really not in place right now so that's that's a big factor another factor from in my field that's a particular concern is you know being able to track people's mood and their their mental health and even their physiological there are a lot of private things that make people very differentially susceptible to those those factors and that they may not want revealed that they might not feel comfortable with other people knowing and that I think again that information can be used against them in so many ways for employment for insurance for medical care access even for mortgages that that is a particularly dark specter that some of this health information particularly the kind that could be gleaned from the sensors that are coming in the next few years until a medicine if that was abused you know we have in place HIPAA and other ethical review boards when we go to use these data at hospitals and universities but I don't think those protections are in place for data that's going to be out there in the public so that that again raises real specter for abuse I'll follow up a little bit on this and this is also sort of based on some news reports of some of what has been reported about what's going on with some of the social networking sort of almost so let's think it's referred to as a credit score within China that it and the thing at least the way that it's proposed I don't know it's to extent to which this is actually been operating in China but the idea is out there so you know this you know if it doesn't happen China might happen someplace else is that essentially you start your your score and your sort of which might allow you access to a visa to go travel someplace or a higher speed internet or other sort of amenities that you might want is based not only on what you're doing where you might be going what you might be saying but also what people are doing people in your social network are doing and again I I don't have a firsthand experience or sort of direct knowledge of this but that idea where you're leveraging someone's social network to essentially keep people in check and you tell your friend your crazy friend don't don't post that conspiracy theory because that's going to reflect badly on me I mean that's a really insidious power because it's no longer the state doing something directly to an individual it's it's basically maybe weaponizing one's social network to keep someone in line and the extent to which it actually ever comes to be I don't know but that when I think dark side I think stuff like that so I have I have a couple points one when I was getting up to speak I was thinking that I always feel bad as the lawyer that I have to do the gloom and dark and all the bad things that could happen so it's kind of nice to have the have the question another thought I have is that I've heard and and read about all the things that being discussed in China and I don't have any firsthand knowledge either and how far it's going to go and what limitations they'll put in place from either technical or legal or regulatory operational standpoint I do know that one of the reasons I think one of the reasons that in the UK why the CCTV cameras have become so accepted is because they do have a pretty vibrant data protection law around the data that's collected and people feel fairly comfortable that it's being used in a certain way in no other way now I'm sure there are leaks and things along that you know that where it happens and but but in general there's this sense that data is being used the reason that it's being collected and they see the benefits that they don't see maybe with some of the social media and other companies the point about the social networks and how China is you know may use it again I don't have any first-hand knowledge but I was a Soviet analyst during the during the Cold War and I know social networks were a way that people were were punished and didn't get jobs because how who they operated with it wasn't because of who they connected with on Facebook but because of where they went to church or who they hung out with so I would argue that that's been you know been around for a long time this is just a way to deal with that not saying it's right I'm just saying that's that's where we are my greatest concern is that geospatial information is not used to benefit in a way at during an emergency or famine or some other way because the laws and policies that we put into place that make it too restrictive to use and that to me is frankly the greatest threat you know at least in the United States because there are so many benefits and if we don't think about it the right way we're going to we're going to go up too far off a course and we're not going to be able to do some of the things that this technology allows I just want to add I mean I think data has always had this possibility to be used for good and evil I have this talk that I say data for good or evil I couldn't have given that talk but you know like even the million dollar blocks data that I put up at the beginning of my talk you know you could turn it around and say oh where those those are all where the criminals are we're not going to give any mortgages there you know like so I also think like one thing that's important for us to do is how we frame analytic like you know as an academic of course I think about this a lot and it's like how do you frame what you're talking about in order for people to hopefully use it in more positive ways but obviously I mean you asked what are all the negative ways I think almost every single data that could be used for negative and we could probably put one on the wall and come up with like a possible malfeasance and so it's I guess exactly what you'd say is how you put protections around those data sets are the way that it's used that's important I was thinking as you all were talking I don't know if you know the TV show Black Mirror so you guys know if you really want an investigation and all the bad ways that technology could be used in society that's like a really good one but actually I think some of them have come true like there's an episode in Black Mirror where you get certain you get access to certain kinds of services based on what how likable you are on your social media network and those are happening in China you don't have access so they'll look at your social network and they'll see like whether people whether people like you or not and provide you access certain kinds of car services or and so I think you know you're talking about you know what your social network is in terms of you know coming after you for you know police investigations or those kinds of things but I mean it can apply to lots of different kinds of services that we access and can create a new a whole new marginalized society because you're not liked on your social media yeah I was actually going to mention Black Mirror as well they do a much better job of being imaginative for these kind of deemstay scenarios I think but I also wanted to bring up what I think is quite fascinating from a social perspective as well which is that it's virtually impossible for a 14-15 year old kid not to have a social media account right so we can regulate it as much as we want there can be a hundred more Cambridge Analytica issues that 15-14 year old kid is not going to get off Snapchat or Facebook or whatever the case may be and increasingly these are all location specific technologies right so we actually have Snapchat maps and all those things associated with that but what's interesting to think about is is you know we could determine that there were Russians in Ukraine or Crimea based on the social media feeds because they were stopped from sharing where they were but you could not stop a you know 20 year old guy from sharing his Instagram post so there was that location issue but if you think they know three to four maybe five presidents from now will have had social media account from birth be it like a dark profile or whatever Mark is going so whether they have an account or not there will have been a presence for everything that they've done online right so think of the regulatory implications of that who has access to the data that's not government that's storing that data that's private companies that's private industries that we're having a lot of issues with right now and they have access to data that could potentially be used in one of two ways so that's the kind of stuff that scares me okay okay can I actually just add one final one one other fear I have and it's also just opportunity cost because you know the money and the effort and the energy that's spent on working with these sort of data especially in times of austerity especially for government is money that's not spent on other things that have been the traditional way we can have a conversation about which is better but often I worry about that this new shiny sort of data sensor kind of model is pushing out a lot that's actually been proven good you know that and the fund and the money's just not there for for things that have been working fine so I want to add one last evil thing I mean and really I'm not talking about this I think homophily is also a problem like the idea that and this has been in the media a lot but the fact that we will only be provided with one perspective and and one you know based on our social networks and the kinds of news we read and the other people we get one perspective and I think that is really dangerous and we're already seeing the danger of that and last we have no questions back here oh yes my name is Bruce Crawford I'm interested in what happens when this sort of internet of things data gets integrated into official statistics whether it's the United Nations doing it or somebody else like that I'm under the impression that some of these internet of things devices are not exactly precision instruments so I guess the same thing applies when you go to citizen science ideas you've got these people with their homemade sensors you know their raspberry pi projects or whatever and then you have to take a second step and sort of validate or adjust that information to comply with what an appropriate scientific sensor would would read before you put it out as an official statistic or something like that so I guess I'm wondering is that step even necessary should it be something that's necessary what kind of you know biases might be out there in addition to the self-selection bias that people say I'm going to start building these kind of sensors and distribute them to all my friends any thoughts on that you know I guess you know as the example I gave earlier today in the imperial valley we've run into instances where our warning system disagrees with the government's warning system and really you're trying to give people more information not a higher level confusion so we spent a lot of time trying to think about what to do with the data when our lower precision sensors are probably giving a different message now if you've got you know 500 micrograms for cubic meter of pm 2.5 in the air so you're like Beijing on a bad day you don't probably need the most sensitive sensors in the world or during the very large wildfire event we had last December the nearest government monitor was maybe 11 or 12 miles from my house but there were 10 purple error sensors that were much closer I could see when the plume was moving so there are going to be some instances where the the degraded information is still going to supply a useful benefit to the public but I think we have to be careful about the messaging so that we're not putting out two competing messages one that's maybe with more highly resolved spatial and temporal data that's got more air in it and then another one that would be the more conventional government monitors so that's that's a big challenge and but I think you know your your other question about validation and calibration yes if we are going to begin using these for some type of official on decision making we do have to have some understanding of how they would relate to a higher quality instruments and what that degradation of information is going to need you know mean for the types of messages we're going to give out and the types of uncertainty we're going to try to convey around those messages so those are just a few initial thoughts from another comment I don't have any firsthand knowledge about this but I do know that the the UN GGIM which I mentioned earlier actually comes out of the statistical office of of the UN and part of the work they're trying to do is to align the work that's being done within the geospatial agencies at the national level with the statistic agencies and I think you know some of the issues that you've described in terms of nomenclature nomenclature and accuracy and precision and how you can align those are things that they are actively looking so I you know I encourage you to go to the UN GGIM website to check that out because I know they're having meetings and publishing papers and things on that tap so a couple comments then a question so just to find that one exactly like the UN SDGs is another one that right now are being national statistics that they're trying to now get more granular data through crowdsourcing and other sources of you know where the school is actually built what's the reading level to actually then float up and say national reading levels and so on so that's another one to look at so on the first I'll start on the on my commenting question about on dark things I'm surprised no one's mentioned minority reports right it's another really good example of a dark future but I think what what that was demonstrating I'm curious what your thoughts are it was actually more about utilizing all this data to do simulation and modeling and touched on a little bit here I mean there's been a lot of examples where this has been amazing disaster and flood response we're using stream sensors and crowdsource data predict flash flooding inundation putting that out into ways and telling people to route around what might be a dangerous bridge the USGS is doing that now is Earth quick data and I think even coastal erosion data and other things and then using that to influence their predictive models so I'm curious like in urban planning Sarah I don't know if you coined the term or others about social topography about how people kind of you get them getting these these community sheds and where people tend to go at different times of day and then people want to influence that where do we pre-position police where do you pre-position free t-shirts that they give out at parades and things like that to where do we close off bridges if we need to for safety or evacuation so where I'm curious kind of where does it go from here if you have traditional modeling and simulation through traditional sensors you have more real-time data and now you have the social data that can influence that and help you really refine it and let me throw one more buzzword in there about machine learning where you can constantly feedback on those and apply kind of regression algorithms to optimize what role do they have do these fit together are they all kind of do they stay separate in their own little boxes of research and application okay well did you want to did you want to do you want me to yeah so well you know I think the the beauty of the machine learning methods is that they're always based on cross-validation that's outside of your model you know so that they're learning from how well they can predict so when the question is purely prediction whether it's a flood area or a wildfire you know burn that's going to put a plume up they can be very useful and and very powerful and if the information is not informing the prediction that much it will be penalized and not included in the final equation if your cross-validation algorithms are reasonably conservative so that I think is it's a very powerful way to integrate different types of data from numerous types of sensors of different levels of veracity and accuracy so that you are going improve your predictions because basically it's not going to go into the model unless it's able to predict outside of the data that's being used in the calibration equation so I think that there's a huge role for them to play but they're not going to necessarily allow us a greater level of understanding if your intent is to understand urban processes purely predicting where a flood's going to go or something like that is not going to inform you but for these pure prediction problems there's enormous potential and we're already using them in wildfires where we throw in what should be gold standard information and we're always surprised that other variables will pop up and add to that prediction that we didn't think we're going to be that important so I think for those kind of applications that machine learning is very promising Yeah, I think one other thing to remember and I think this is really important when we're thinking about modern modeling is how do we A, think about the missing data sets so I think one thing that happens often as we start to apply machine learning is that the data might not be as comprehensive as we think it is or it's missing a piece so I'm always thinking that when I look at the results what isn't there rather than what is there is like a really important kind of cross-validation technique but I think it also what's important as it goes back to the work that I did in China is that asking people that live there about your modeling results is absolutely essential and we can't miss the cross-validation of human validation I guess I would say and that that would allow us to reiterate our model and I think without it we do have kind of a doomsday we could have a doomsday scenario much in the same way that we have in the 60s in the U.S. where we kind of said okay efficiency says the highway should go here and then we didn't actually ask people on the ground like oh what are these important community factors and so I would say that as we move into these kind of more algorithmic models in order to make decisions about places that we do need to create a loop in which we ask humans to help us edit the model together they don't have to be data scientists but they have to be able to critique the work I think also I mean for me I feel a lot of this it depends on the the application what you're talking about in terms of you know crowd movements or where people are going to be at certain times a day that's a more I guess sort of an objective measure in some ways into the data once it starts moving into I just want to call much more sort of like you know socially constructed data you know really have to start worrying about what kind of you know what kind of biases or what kinds of things that were how that data was built back from whatever however it was originally constructed especially when you're talking about behavioral or criminal criminal activities and things like that are there larger are there larger structural reasons what that might have that might be having the constructed data in certain ways that we want to be aware of when we're moving into that because again it comes back to the old thing that you know data is made it's not just dropped from the sky and that's that's the real sort of care I would take for that sort of approach not doom and gloom but the old although I do some risk and hazards work but you know Glenn McDonald I think just stepped or no this morning we're talking about historical geography and it just I think Michael you're comment there about the data sets and the storage of the data and I guess I wonder you know for you guys to speculate on the value of this data and the accumulation so I guess one is a question of archiving and how it's stored and how it's maintained and and how do you think that will alter or enhance capacity in the future to do historical geographic work to sort of understand what particular moments are because I mean we think about the texture of any moment and so much of that information you know historically has been lost how can we recreate the world of the 1940s how is this data going to help us create the world of the 20 teams in the future I thought it'd be fun I thought it'd be fun for you guys to think about I'll just answer it quick I mean I think it's a great question because I mean it is it's it's something you know the it's the reason that the sort of legislation about the right to be forgotten has been put out there because we we don't have this and I think for me just sort of you know sort of the thing I always struck by is any every time I buy a new computer I copy everything over from my old hard but I guess I smell a lot of it in the cloud now but you know that the whole thing that since I first got my first computer back in the 90s I've I've kept everything and yeah there's a really rich tapestry that one can build from that I mean I think this goes into you start looking at some of the approaches that people are doing into humanities with and well digital humanities more as well in terms of the kind of tools that you know they're starting to you know use well it's not necessarily it's it's not super sophisticated or quantitative but it is looking for keyword frequencies and looking for certain certain references I think yeah it it opens up a whole new realm of kinds of questions and what we can what how we can explore this stuff so yeah 50 years from now it'll be great yeah I guess some of it goes back to this question of being able to pull out those representative patterns so that we're not going to archive all 60 billion data points from the precision medicine initiative every week but what do we need to be able to say over some longer period of time that this is representative of where the people were and what they were doing and and how do we sample out of that huge array and a lot of it's going to go on understanding the variability in these patterns and and what the regularities are so that we are going to be able to represent the longer term with shorter shorter samples because I don't think you know there's any social benefit to store and of course there that does technically preclude your right to be forgotten because you're not going to be forgotten if the if the data are there so I think that's that's a big part of the challenge is trying to come up with these statistical methods that are going to be allowing us to say well what is representative and and how how do we do that we don't have them yet okay well before Bill Selecki or someone else that's another really big question and runs beyond our time frame I'd like to I'd like to ask if any of you would have any kind of last thought or last word or and particularly of interest to us is if there's something what do you see is a is a gap in practice or a gap in knowledge or a gap in research or if anything just jumps to mind here or if you or we even without a gap if you have any parting I'll give you a chance to have a parting comment and then we'll wrap this up I'll just I'll just quickly wave the flag for something that we've been talking about for a while which is making parallels between what we find in astronomy with observatories to geographic information observatories to information observatories where we take all the sensor data that we have access to we curate this data and we start to build these systems where we can actually examine this in either a visual or some sort of information analysis perspective and continue that that analogy to astronomical observatories where we look at all the various patterns we look at what we were able to do with Hubble Telescope and investigate the various patterns there can we build these information observatories urban observatories whatever you want to call them at a much broader scale and bring these data that's together because right now we often tend to look at a small set of these different data sets one group of sensors or CCTVs or whatever the case may be try to extract some meaning from that and they're still missing that holistic view that I think we we should approach as geographic scientists that's life feeling well yeah I think a lot of the sensor technology can be used to make our cities a lot smarter than they are and you know in many ways we're still running on 1950s infrastructure as it's a bus that runs every 20 minutes the traffic light that turns green at this time of the day every three minutes and then it turns sophisticatedly to every two minutes at another time of the day you know there's no real-time feedback that's happening the electrical grid that Maryland's talked about so it seems to me a big gap is in how to translate a lot of what's going on with with the sensors into actionable items that city planners and other government authorities can then say well yeah this is the priority this really works right now and this could result in massive energy savings or more adaptive capacity so I think that's a gap that the sort of translational gap between what's going on and cutting edge of sensors and what what actually works to make our city smarter right now so yeah I'm I want to just go back to what you were saying which was you know I think I mean Kevin was saying you know I think the real disaster is that you know we aren't able to use some of these datasets for public good when we need them because we don't have the regulation the policy in place to be able to use that data and I think that's I think it's a huge issue we avoid it we don't want to deal with the privacy issue it's uncomfortable like we only deal with the privacy issues there's a lawsuit we're not proactive about how this data can be actively used what are we doing with Uber is running on our infrastructure they have this amazing amount of data to make our city smarter but we don't have access to it like what is so that leads me to my next thing which is I think also something that's really important to research is like what are the role of these different private companies in the public forum right so like whether it's an Uber or a Facebook who is acting as a public entity in a way much the same way we think about the Matassus like what is their role in terms of they're acting as a kind of public good in a way but they don't have to act in that way and I think it's something that we need to be considering more and more you know like the conversation around the Facebook data you know I mean we all knew they were using our data right like I mean it's not surprising but I think what we're more surprised about is what is the role of Facebook in that public forum that it's created and Zuckerberg obviously doesn't care about the public forum but I think that's something that we need to research more as they become more more privatized by smart cities infrastructure and those infrastructures working in our communities great thanks I saw a lot of hits nodding at that we're out of time thank you so much thank you to the all of our speakers and to everyone who came and to all those people who are with us virtually thanks