 How about now? I think this will work. What we're going to talk about here is online corporate intelligence. This is something that's been in the news lately. How many of you can see a show of hands? How many people have heard of the policy analysis market? Just a few of you. This has been in the news lately. What the policy analysis market is, it's actually a DARPA program. What they intended on doing for the sake of collecting intelligence was to have a website where people could basically place futures on things like assassinations of foreign heads of state. Yeah, seriously. Read the paper. It's in there. It did get dropped, right? But basically it was a futures market. It was basically the Pentagon having a booking service or a bookie service where you could place a bet on when the next terrorist attack was going to be and what it would be, who's the next assassination, that kind of thing. It's kind of interesting. Congress got this information two months ago. They just got excited about it about three days ago. But the idea here is that they were planning on using this as an intelligence source. And if you think about it, I think there's some credence to this. I think it would work. Unfortunately they can't do it now because they showed their hand and everybody knows what it is now. But there's another group called the Iowa Electronic Market that's been doing very similar things where basically people are making book on elections and they're taking out bids on who's going to win whatever election. And it's accurate. They're able to predict elections within 2%. Now you ask yourself, why is this happening? Why can you get such good intelligence this way? Because the reason for it is if you're doing just a generalized poll, you're asking a very broad, you want your group to be as broad as possible, right? If you're doing something like this, you're asking the specialists what's going to happen. And that was the government's idea as well. One other reason why I think this kind of thing could work, there's another online service, I can't think of the name right now, but basically you can go online and you can bet on football games or how many hurricanes we're going to get in a season, that kind of stuff. One of the bets they had on, this came up probably about 5 weeks into the last season of Survivor, which is a show that I follow very closely. Basically people were starting to bet on who the two final survivors would be. And it was Matt and Jenna. And anyone who followed Survivor knows that Matt and Jenna were like, there's no way they're going to win. But there was an incredible number of people that were betting on Matt and Jenna to win and be the finalists. Well the people who were running this bookmaking service said, gee this is really odd, I mean this is really odd. So they started to investigate it a little bit and found out it was CBS employees who actually knew who the two finalists were going to be. So don't underestimate the power of collecting intelligence the way. I think it's actually pretty clever that the military figured that one out. Here's what we're going to cover. We're going to talk about how online intelligence gathering is different from traditional intelligence gathering. We're going to talk about the difference between intelligence and espionage. We'll talk just briefly about corporate dashboards and their importance. I'll give you lots of tips from the field. I've been writing spiders and bots since about 97, so what is that like, six years. And about two years ago I realized that what I'm really doing is corporate intelligence. So I can give you a few ideas of how I do some things. The other thing I think you should all consider, especially if you're software developers, is that there's an opportunity here for you. Those of you who are still software developers after the dot-com bust should prepare yourself for the next bust. I spend an increasing amount of my time as a developer not developing but managing projects that are going on in India or Romania, around the world. And that's a growing trend. So if you're a US developer, you've got to take advantage of your market, being close to your market. If you're not doing that, and if you're just somebody who writes shell scripts or you do backends for e-commerce services, you're very apt to be outsourced by somebody who's charging $5 an hour. And every bit as competent as you are. I don't have a problem with that. I'm in favor of global economies and I have no problem with somebody in Vietnam getting paid to program. But it's something we all need to be aware of, that the economy is changing here. So there's an opportunity here because this is one area that it's very important that you're close to your market. Who am I? There's my email address. I've got a little consulting company, there's a URL. I write web bots and spiders for corporate clients. I've been kind of a DEF CON regular since DEF CON 5. DEF CON 5 actually covered for Computer World magazine. Last year I was here and I did a session on introduction to writing spiders and web agents and the last time I looked it's that entire session is online and streaming audio or streaming video if you go on the archives, DEF CON archives. I also write, I speak and I do a little bit of teaching at some colleges around Minneapolis. Okay, intelligence is information. In a business sense you want to know what you can learn about your competition. You want to learn about what people know about you. You want to find out if people are stealing from you. But the most important thing you want to know is those things that you can predict. Things that haven't happened yet. And this is key here. What you need to do in order to do that is you need to collect a library of information and then look for anomalies. You want to look for trends, you want to look for changes. That's the most important thing. I'd like to make a few definitions. Intelligence is not necessarily espionage, launching trojans, that kind of stuff. It's not necessarily taking covert actions either, which is tampering with the situation to affect the outcome. I think it's kind of important if you're going to do this kind of stuff. You need to somehow set some boundaries on yourself. The other thing is that you don't have to do this stuff. You don't have to do espionage or covert actions to get into trouble. Who was it? Sean Gorman, I think his name was. He's a PhD candidate at George Mason University. He started out doing a paper where he looked at, he wanted to map the Internet and see economically how areas were served or not served by the Internet and do his thesis on that. What he ended up doing was mapping the entire fiber optic network in the United States, which became a very controversial thing. Not only did CEOs want to get a hold of it, but so did the federal government because it showed a lot of vulnerabilities and it became a matter, they say, of national security. There's a lot of information you can get out there just by using traditional online methods. Okay, what are some traditional services for collecting corporate intelligence? Well, you can go to conferences, you can go to your competition's booth, you can talk to people at the bar, that kind of stuff. You can hire your competition's employees, which is a very popular thing to do. Usually any market will have two or three major competitors within it. At the top of my head I'm thinking Nike and Adidas are both in a Portland, that kind of stuff. You can look up patent records, but that's really after the fact because that means it's at least two years after it was filed. You can use secret shoppers, you can have people go out and look for price comparisons, that kind of stuff. You can study help wanted ads, you can read trade publications, you can talk to vendors are a good source. This advantages of the traditional methods is that the information you're getting is mostly after the fact. These are things that already happened. It requires direct contact with your source. Mostly one-time activities and they have to be repeated and it's difficult to repeat these things, especially if you're doing things like stealing employees from your competition. Very difficult to do anonymously and it can be quite expensive. The advantages of online corporate intelligence are that it can be done from a distance with some degree of stealth. You can automate it and it can be done relatively anonymously. Those three things right there are also the three things that concern security people. Those are three things that also apply directly to hacking. The other thing about online intelligence is you can reduce the latency between when something happens and when you know about it so you can make an informed decision. It can be interactive, we'll talk a little bit about that. It's easier to create relevance between pieces of information because typically everything goes into a database, we can do queries and that type of thing. It's important to note that gathering intelligence means learning some new habits. Generally speaking, what you do with the internet is determined by the agent you're using. If you're using a browser, you can be looking at websites, right? Male clients, you can be reading mail, news readers, you're doing news. I suppose you could argue you can tell them that in any of those, but people don't. Generally, if you want to have a competitive advantage, you have to have software written, which means typically web bots or spiders. Online corporate intelligence is most effective when it's automated. The data can be parsed and stored in a database. Stores data over a period of time so you can look at trends. You create context by combining various data from a variety of sources. You use statistical analysis to make recommendations and you should also give your end-user some ability to make configurations because you never really know what you're looking for until you've got information. Okay, how do you create relevance for information? You do it primarily by cross-referencing multiple sources. You want to gather information periodically so you can look for trends. You want to show relationships between data and you also want to automate it. Here are some simple sources. These are the first places I would look, in other words, to gather intelligence. Corporate websites, job postings, if they're current and up to date. You can study people's hiring habits and by doing that over a course of a year, you can kind of figure out what this company is going to be doing, especially if they're hiring people that you wouldn't expect them to be hiring. Product pricing, you can look at websites to find out pricing. Not a real important thing if you're selling a commodity, if you're selling something that's kind of intangible like hotel rooms or something like that. It's very important to do that kind of stuff. You could also do a clipping service type of thing by looking at news. Government websites are great places to look for intelligence, primarily court records. It's amazing what you can get off the internet by looking at court records. People's social security numbers, it's all out there, so handle with care. SEC filings are good if you want to look at the economics behind a particular company. Again, patent records and census data. A few more online auctions are a great way or a great thing to look at because they can tell you what the true market value for something is. Who is servers? It can be valuable news servers. I find to be valuable as places primarily to get links to websites that are hard to find. HTTP headers, data is not real important by itself, but combined with other stuff, it can be good and definitely mail. Okay, here's some quick technology for how I like to set this kind of stuff up. If you're interested in the nuts and bolts of how to write a spider or how to write a web bot, again, I would recommend going back to the archives for DEF CON 10 and review last year's presentation. This is basically what I like to do. I like to use a standard web host. It can be one of those $10, $15 a month types. You use that, you write your spiders. First, what you do is you identify your sources. You write your web bot spiders and parsers and you put it all off on that website. Something that's useful too that I found very useful is dial-in servers. If you want to host the stuff yourself, there are great things you can get by dialing into mainframes. Sometimes with a small subscription fee, but there's really good stuff there. Incidentally, if you ever want to do that, the tool that I use and I like a lot is called COM7. Okay, so now we've got a peeing client that needs some way of viewing this information. What I'd like to do is present the information in some kind of a protected website. I'll show another example of that later. There are other ways of doing this. You can also email them information. I find this to be the best way of doing it. The other thing you need, and this is kind of key, is some kind of a scheduler which is running a cron job or some kind of scheduled task that goes out and contacts a web service. This is all on the same server, by the way. It contacts a web service that says, okay, go out and spider now. Some quick examples of what you can do with this kind of stuff. Corporate dashboards are ways of letting the client who's buying the service get the big picture of what's going on. So there's going to be lots of different data being presented. It's going to be pre-filtered. You're going to run statistics and you're going to hopefully show some kind of trends. Also, as a developer, it creates some branding opportunities for me. So that's why I like doing this. Here's an example of one, potential one. This would be for somebody who maybe has a hotel. And this would be a calendar for the month of July. We can see we've got competitors listed here. And we've got their room rates listed the other way. Down here we've got the lowest price, highest price, average prices. Here's what the client's price would be. And this is their position within that group. This is really important kind of information. Right now hotels spend tons of money on having people call their competitors and say, what's your room rate for August 16th? It's not a real effective way of doing things. If you can automate this process, it's great because then you can have a history of it. We can go up here, we can see if the trends are up or down. We could drill into this information, look at trends. This could be really powerful stuff if you also had occupancy rates. If you had extraneous information like, you know, conventions in town or, you know, if there's been highway construction, all that kind of stuff. Or if you could have this into profit and losses on their SEC filings too, it might be good. Here again, the key thing here is that you can find out where you are positioned within the people that you consider to be your competition. And that's key. Here's a male example. You guys get these things. I get about three of these a week probably where someone's trying to spoof my eBay account where they tell me that there's been a change and I need to enter my ID and password. You guys have seen these, right? If I were eBay, I would have a whole bunch of mail servers sitting out there that basically collect this kind of stuff. I love this one. Dear eBay member, your account has been accidentally, has been chosen accidentally. Since there are a lot of cases of cheating, we'd like you to visit your account. If you're not going to do that, your account will be removed away from our system site. That sounds pretty official, doesn't it? I love it. Here's an idea. Policing the Internet. People who break into houses, break into cars and steal stuff, they want to get rid of the stuff fast. They're not concerned about getting a market price for it. They just want to basically fence the stuff and get cash. So no surprise, a lot of people are selling stuff on online auctions, right? It's a lot of stolen merchandise being sold up there and it's a problem. Here's a solution. You create an online interface that law enforcement can use and they can basically say when something was stolen and here are the things that were stolen. You go out on eBay or any place like that and you look for groupings of that merchandise that relate to a particular seller, one seller, and maybe you can even add the city that things were stolen from. It seems like it would be a pretty effective way to cut down on that kind of stuff because people who do that kind of stuff aren't that bright, they're going to try to sell it all at once anyway, so it's pretty easy to track them down. Here's another example we can do. You can find out what people are reading. I used to do this kind of stuff all the time. I used to work in an advanced concepts group for a major medical manufacturer in Minneapolis and one of my jobs was to evaluate emerging technologies. The other thing I would do is I'd try to find out some things about the competition. I'd always find out what books they were reading and the way I would do it is with the Amazon purchase circles. Are you guys familiar with these? These are great. You can find out what people are buying at various government institutions, educational institutions, or corporations. There's probably thousands of them on there. I know Amazon got a lot of heat for doing this. I thought they weren't doing it for a while, but now apparently they're doing it again. Just for example, what are the employees of Apple computer reading? Let's find out. Mac OS X and a nutshell. It makes sense. Think about it. They've got thousands of employees who are accustomed to OS 9. It's a major change for the company. You expect that kind of stuff, just like you expect Mac OS X hacks. These are probably written by Apple people anyway, I would guess. I don't know. This is the one that amazes me. The missing manual. That's an excellent book, by the way. Now, what's this? This is different. Pattern recognition. The fourth most popular book purchased from a network at Apple was about pattern recognition. That's interesting. That's the one I would look at. I'd want to know more about that because obviously they're going to do something. Harry Potter and the Order of the Phoenix. This is how you can tell that this information is real, right? This is what I love. What should I do with my life? Honestly, if you're going to take a job at Apple, by all means I would recommend seeing what the books they're reading because you can find a lot about corporate culture as well. Interactive intelligence. It's not enough just to collect data sometimes and analyze it. Sometimes they actually want to take action on the information that you're doing. This would be, for example, sniping agents, which are quite popular. This is basically software that places bids for you on eBay or wherever. The idea is that you can buy things at a lower price because you don't bid things up in the process. You can write intelligent shopping software that not only will do the sniping action, but you can also put in a variety of... You can enter things that tell you, I'm looking for this, this, and this. The software would go off and it would know what good prices are because it's been watching the market for a while so it could automatically place an order for you or place a bid for you when certain criteria are met. This is extremely powerful stuff. You can use it on stocks, you can use it on online auctions, and you can use it on a lot of other things I can't tell you about. A couple of things about online sources that are important. It's really important to respect the bandwidth of your sources. It's also important to be as stealthy as you can. Here are some tips. Treat the bandwidth with respect. The thing you want to realize there is that most people who download information from the net will just download the HTML. If you don't also download all the images, you're going to leave really weird-looking logs in their server file, so you want to make sure that you also download the images. You want to introduce some randomness. If you've got something that's supposed to start at 8.03 every day, don't start it at 8.03 every day. Add a little bit of variability there. Randomize the time periods between the times your web bots run. Basically, you want to make it look like a person is doing this. You want to randomize the sequence of page downloads if you've got a number of them that you're doing. Rotate IP addresses if possible, and use a link proxy. I'll show you what a link proxy is right here. In the earlier example, you've already seen this, right? This is what the client might see. This is the dashboard. Well, a lot of times you might want to have a link on here that somehow will link them back to the source. It's a useful thing for a lot of reasons. Trouble with that is, in this person's log, they're going to see the address and the link and everything of your dashboard, right? Because all that information is in your HTTP refer server variable. So you don't want to do that. What you want to do is you want to link to what I referred to as a link proxy first, and the link proxy then will go out and grab that information and display it for you. And what the link proxy does, its main reason, well, it's got two reasons. The first one is it removes the refer variable, and the second thing is, in this web file, this web log, you aren't going to see the IP address of this server. You're going to see the IP address of this server, right? So here's how you do it. If you have a link that looks something like this, some href www.someonlineresource.com, members link.html, you'd replace it with a link to your link proxy, which I'm calling somesafeplace.com, slash linkproxy, and then you pass this URL as a variable to your link proxy. This is what the code looks like. I tend to do all my programming in PHP, MySQL, and curl. It's a little bit of curl right here. Basically what we tell it, we're just saying, this is the URL that we want to download. We don't want you to return the header information. You can add username and password stuff if you want. This line is really important, because here we're basically clearing out the refer, okay? And if you want to, you can tell it that you're not some crazy web bot, you're really Mozilla, and then you execute that. The other thing that's important, if you want your links to work on what you just downloaded, then you need to add a base html tag in your header. All right? That's the end of my presentation. If you have questions, this would be a good time for them. Otherwise, I'm going to hang out around the pool for a while, and I'll take questions out there as well. Right? Yes. Yeah. Basically, if they post, they don't want you to do it. I don't know what legal standing that has. Before you start using a web source and reselling the information at the very least, you want to check copyright, and you also want to check out terms of use.