 What I'm talking about today is this is basically a follow-up to talk I gave a few years ago at DEF CON 18 about looking at information that's freely available out there on the net and doing some training and analysis of it and trying to make something useful out of it. So a little bit about my background. I'm currently the director of technology at the Center for Law Enforcement Technology Training and Research, which is a nonprofit research center that got spun out of work that I used to do when I was a professor at the University of Central Florida. I was there for about 10 years and I, in the engineering program, taught computer engineering. I developed the computer security curriculum there and did embedded systems among some other things. Eventually moved away from teaching and more into research and we ended up spinning out that research into an independent nonprofit center. I'm also CTO for hover fly technologies and prior to this I used to work as a research associate up at the Institute for Security Technology Studies at Dartmouth College. So over the course of the last 20 years, some of the things that I've worked on are up here on this list and it took me quite a while to catch on to kind of what the common theme between all of the things I was working on is I'm kind of slow to pick up on these things at times and eventually as I started putting it together and kind of realizing some of the same things that I was coming across and same things I was doing, I realized that all of this stuff from information sharing that I'm working on now to hardware sensor networks to intrusion detection systems, they really all rely on some of the basic concepts of sensor data collection and in particular sensor fusion because everything that we're doing in all of those things that are listed up there, they're all based on taking some sort of sensor and using it to try to get some measure of reality but the sensor always has some limitations. Sometimes it's a significant one, sometimes it's not so bad but every sensor that we look at reality, including ourselves, including when we view things, it's always got some sort of limitation and it's one particular view and that influences the data we're seeing and you can get, we have to work towards trying to get more meaningfulness out of the data that we have. One of the ways that we do this and one of the things, the techniques that I find most versatile I'd say is sensor fusion where we take multiple sensors, we take multiple ways of looking at the same thing and kind of put that together with the hope that we can take the limitations of one observation and cancel it out with a different observation that has a different set of limitations. So at least that's the hope, at least if we can put two halfway decent things together and get something that's more than the sum of its parts. So before I get kind of more into my stuff, I always feel like in this particular subject that I have to give an acknowledgement to the guy that inspired some of these thoughts in my head and it was actually at DEF CON, way back at DEF CON 13, Broward Horne gave this talk on meme mining for fun and profit. And his problem, all great ideas come out of a problem and I guess a lot of bad ideas come out of trying to solve a problem too but his was a really good idea. His problem was that he would find that he would like start learning some new technology, some new tool, or at least it was new to him and by the time he felt he had mastered it, it was kind of on the way out or the market, the job market was just saturated with people doing that now or it had just fallen by the wayside of nobody cared about it. And he was always kind of struggling with trying to figure out what should I spend my time studying, what should I learn to kind of get ahead. And he ended up kind of thinking about this is like everything's got this sort of saturation curve where a trend starts happening and there's a little bit of chatter about and eventually it starts taking off and everybody hears about it when it's big and growing and then it kind of gets boring and old but he wanted to try and identify these things earlier on and went through and did it. This is a slide pulled out of his old presentation where what he would do is he would look at new sources and forums and blogs for information and key words and kind of pull those out and see what was trending on there with the idea that that's kind of a precursor to seeing that early chatter about it, something can take off. This one in this particular case, this is the red line shows how many times the word palladium showed up in news reports and forums and the blue is the price of palladium. And you can see that clearly there's a lot of chatter about it before the price spiked up and then it actually the chatter dropped off before the price comes back down. So it's a really good, apparently a really good indicator for predicting the future there for what's going on. So anyway, that kind of that thought inspired me and when I was when I was teaching I'd have students who would come to me and they would want to know what do they need to what skills do they need to get a good job and all of that and I tried to apply what Broward had done in a similar way by monitoring and observing trends and this is mostly single variable observation. It's doing some correlation and it started off looking at Craigslist data just because Craigslist is nicely available. It's well organized by geographic location and you can go in in certain categories like where they have the job postings in there is categories by different types of jobs. And I know like Craigslist isn't necessarily the best place to look for jobs, but it was kind of had some interesting properties in that it's a lot of small companies that post on there that are maybe trying new things. A lot of entrepreneurial companies, startups, things like that are posting. They're not so much of the big ones. So that actually tends to skew it a little bit more towards being a leading indicator, something that is pre will come out ahead of a bit ahead of the curve. So some of the things that ended up looking at just because I found correlations in here were jobs, items for sale and adult services. And I mean, I didn't I'm not saying I looked for adult services on Craigslist. It's just my research took me there. So you know, things I saw look like this. This is just this is an example. This is just showing job postings by date. And there was a this is showing the the dips you see there. This is a weekly trend. And these are some different cities that goes kind of dead on the weekends. There's a spike on a Monday spike on a Friday, you see this kind of pattern. And it's okay, fine, whatever, it's kind of boring, but you know, sort of interesting, not unexpected. But there are certain things that started standing out when you look at this data in this particular case, there is, you know, one of the things that jumped out of me was Austin never had a spike on a Friday, it always dropped off. You get kind of hard to see, but it's the orange line in there. It never has a second spike in it. Thought that was kind of interesting. The other thing, and this is what came out of the adult services was that there was a correlation between adult services being offered and bicycles being for sale, or actually a lot of items being for sale. And this led to a couple of interesting discussions that were one of my favorite moments at Def Con was when somebody stood up in the audience said, Hey, I think I can help you out. I'm from Austin and my sister is a prostitute. So that and then there's a light into a discussion of things you can sell one time like a bicycle and something you can sell over and over and over again. So okay, that's what I had done before and we had looked at that. And there's some interesting stuff there, but I wanted to kind of dig a bit deeper into the data and look for more relationships and more correlations between data and hopefully be able to pull in other sources and do some fusions on this. So I started looking for things like different cycles in like the job postings or correlations in them because at the time when I was working on this, keep in mind I was really trying to help out some of the students that were graduating looking for jobs, trying to help them find out what skills they needed, what would really kind of help them get ahead. There were definitely correlations in there. You know, there are things in the cycle you'd see, but nothing unexpected, nothing really interesting that jumped out in related skills. You know, you can say, like you could say that if a job was going to have one particular tool set or skill set listed, there are other ones that are likely to be listed with it as well. Again, it was nothing, nothing really jumped out at me as being unexpected out of it. But eventually there are a couple of interesting things that showed up, one that I think is just kind of funny. And it was, it was how often the words drug test or drug screen showed up in a job advertisement correlated with the different skills in it. And apparently, like if you don't think you're going to pass a drug test, don't bother learning SAP because it's not going to do you any good. You know, on the other hand, if you want to develop iOS applications, you know, go knock yourself out. You know, I guess there's probably some logic here is like how corporate or uncorporate the environment is I suppose. Another thing was looking at jobs that had benefits and like retirement and health and medical. You know, the interesting one, the best one was COBOL. But I think it was a bit of an outlier because there were just so few jobs offered with COBOL. And I guess to get like any like old grizzled COBOL program to come work for you, you got to give them a lot of benefits. You know, things like Python and Android and HTML, looking for somebody to develop your web page, you're not going to give them much in benefits, I suppose. So, as I was looking into this, like I came across, actually this is much more recently, this is earlier this year, I came across this article. This is actually out of the Journal of Psychology where some psychologist, Dorothy Gambrill, was doing something similar and actually went through and looked at the missed connections part of Craigslist. And if you haven't ever been there, this is where people like say, oh, I saw you as I was walking across the parking lot and tried to catch your eye. And then they go and post this up on the internet hoping that person will find this and somehow make a connection with them. And these are organized by state. These are where people make or had the most missed connections. And there's some things that just make me find funny, like Walmart's got a lock on the south, you know. You know, Oklahoma, it's the state fair, of course. You know, it makes perfect sense. And, you know, in Nevada it's casinos. And the one thing that I just had to put this up there was one thing that just jumped out at me like crazy was Indiana. It's at home. Like, I don't know what they're doing in Indiana, but I'm pretty sure they're doing it wrong. So I was talking with a friend of mine about this stuff, Dave Grubleski, and he, his eyes lit up and he started telling me about this thing that he had done where in his neighborhood, this is back in Orlando, Florida, his neighborhood they'd had a rash of crime recently. And they didn't really know they had a rash of crime until all the neighbors got together and started talking with each other. And they found out a whole bunch of, everybody knew a little different incident that had happened. So he went and did some searching and found out there was some open source data that the sheriff's office and police department would post about their CAD, their dispatch calls. And he started writing this little tool to take that, do some geolocating on it and tweet it out and then you can subscribe to it and get tweets from this thing, like really hyper-local things for your neighborhood about what's going on there. And it's actually one thing that's funny, I kind of pulled, I just pulled this up earlier today. And like I, you know, I was just noticing things, this is back, this is in Orlando area, you know, the first tweet that's on there and I'm amazed that the, you know, the sheriff's office is putting this out. They're basically saying there's a designated patrol area available, which means there's an area where there's nobody patrolling it currently. And this is down, like, in a real tourist trap part of Orlando. So, you know, I mean, that could be useful information to somebody to know there are no cops there right now. Then there's a few accidents and then I guess the people down at the bottom down on Poppy Avenue would be happy to note there's a fugitive from justice running around in their area. So this kind of led us to, like, look into more sources for data because what they offered where we were wasn't very, wasn't very useful or organized. And we found out and started looking in places that kind of subscribe more to the open gov system. And this is a movement to have more transparent government data. Some cities put and publish huge amounts of data about what's going on in their city with the fire department, police department, live interesting data in Seattle, Boston, Chicago, a number of others. These are three that spend a bit of time looking at. There's information about incidents that are going on like police fire in Chicago. You can actually track where the snow plows are in the city. You can track where garbage trucks are in real time from the from the city, which I just find really kind of fascinating. There's information about where bicycle racks, public toilets, land marks, and even where cameras are. Where the city has all of its cameras posted, which I that one I thought was actually particular and interesting, but you can really go on here and make a map of what is an observable location throughout the city and what is not an observable location, which again that could be useful information for somebody. Here's something, the Seattle one's great. They've got their visualization tools built right into this thing and this is a showing a map showing police incidents over a period of time around in part of Seattle and I pulled up this area and you'll notice that like most of it everything's kind of in that same yellow-orange except for this one big glowing red blob out there and you know over in Georgetown I don't know if anybody's from Seattle here, but I'm like wondering what the heck's going on over in Georgetown and you can look in a little bit closer and right next to it is the Boeing propulsion engineering labs which you know that that makes me feel really good. So so coming back to like an area I know a bit more about back in Orlando we pulled up data that had we pulled out traffic tickets because they they don't publish information about like who got the ticket or what exactly what ticket was for but you can see when there was a traffic stop occurred and I we looked at it and pulled data that covered three roads in the area and these are this is right out by the University of Central Florida these are three roads that they all run east west and they're kind of the three major roads just kind of ones right into the university ones a bit north ones a bit south and they all have about the same amount of traffic on and they all have a very similar traffic pattern and when we went through and what this chart is showing here is this is each one of the kind of groupings is a is a week-long period all five weekdays and then it's repeated over six weeks and one of the things that found really interesting was the chance of a traffic ticket occurring on a on one of these roads the order it was always likely at different times of the day it always followed the same sort of pattern particularly between this highway 50 and University Boulevard that the the highway 50 traffic stops always preceded the University Boulevard traffic stops and when you go out there and you look at the traffic the traffic patterns not really any different so if you start thinking about this and start putting together well why you know why do you always see one before the other I don't have you know I don't have hard evidence to back this up but what our belief is is that you're seeing a influence of the patrol pattern of the police in in the city so you're actually able to kind of get in there and through their information that they're putting out sort of start tracking them it's kind of like you know there's a talk I went to earlier yesterday I guess it was there's a great talk with Brendan O'Connor that was talking about tracking people by seeing like information their devices are spitting out on wireless networks it's a it's a similar concept that they're putting out a lot of information here that is that if you look at the right way and you take the right pieces of data and put it together you can pull a lot more information out about what they're about what they're doing and what's going on so you know why so by this time I've kind of changed the kind of what what I was interested in doing and probably because I quit teaching and I left the university so I don't have students anymore so I'm not that interested in helping people find jobs so now I found it kind of interesting to like look at these look at these government entities and the the the police and and other things are going on also because I work worked with law enforcement a lot and it's kind of interesting to see like how on one hand they're very protective of their data but at the same time they're putting out a lot of information that I'm not sure that they quite realize how much that they're they're they're putting out there frankly I think it's actually kind of a good thing I like being able to have more information and being able to look back on them and like I say you know why should the NSA have all the fun on spying on people so the what's next with this and and there's there's there's so much more I'd like to talk about but these 20 minute talks you have to be kind of fast in that what what I'm really interested in is actually is expanding the sort of the model that we've been using on on this data to be analyzed we kind of built things that are that are very purpose driven that what the first set of analysis we did was very structured around the the seeking out the jobs doing that and then kind of got got sidetracked by the crime and going off that direction and want to bring this back back together and and try to build a more robust model for analyzing this data and throw some data mining at this where so far a lot of what we've done has been what I'd say is like hypothesis based where I make a prediction about something I think I should see in there some correlation then go looking for it to try and see if it exists in the data or doesn't exist and I'm sure there's a lot of relations that are in there that are things that you know that I wouldn't expect or I wouldn't wouldn't find otherwise I want to throw a bit of sort of data mining and kind of that sort of a blind either either AI or brute force type approach to finding relations throughout the data so I think I'm about out of time right now and I'm getting a nod from the back so I'll wrap it up there and if there are any questions I'd be happy to take a couple till they cut me off thank you