 So I guess we're a little, we're definitely early. We're definitely on the clock Welcome to speaker workshops and now it is the 210 talk and Once they just have a little bit to say about our presenters Sam Herb. So Sam Herb, but just real quick, Counselor 9 and I Also, I I think you're the very first black badge holder. I know personally Is way is that a radio? That's radioactive, right? Is that the radioactive one or thou shall not speak of it? Are we gonna like to sing on fire? Without much ado, Sam Herb. Thank you So A long time listener first time speaker, so let's get going So you're gonna connect to the wrong domain name. So for whatever reason for the past few years, I've been very interested in domain names and this is kind of a look at some of the Different ways that you can be tricked or your computer can be tricked Or you could be fished to connect to the wrong domain name So a bit about myself Software engineer, I work at a cool company called Akamai. We're hiring people like you. I have to say that and I have a Defcon 23 and Defcon 24 Uber black badge with Counselor 9 Yeah, that's me at clothing ceremonies last year a disclaimer The opinions expressed here are my own and known with fish in the creation of this presentation, which will become a bit more obvious later on Yeah, so I'm gonna start out from kind of least severe and kind of build up from there And so just to get going we have to start out with typosquadding Humans aren't perfect. They're gonna connect. They're gonna mistype the domain name they type in and unless this person could register these and Trick somebody into entering their bank credentials for instance So so this example here is looking at all the one keyboard letter off variants of American Express comm If you actually look up the registration status of each of these Like the example is Q American Express comm all but two of them are actually currently registered or actively registered as of April I believe I Did all this in they most of the research here in April 2017, so that's roughly when all the status from So the reason why I chose American Express here is that it's long enough that no one's company name It's gonna be Q American Express comm whereas with a shorter website You might get some legitimate websites, whereas I could argue that none of these are actually legitimate So building up from there we have to look at bitsquadding This is one where your computer is actually gonna get the domain name wrong by flipping a bit If the example I gave here is a I swapped a bit on Google analytics and I end up with Eagle analytics This is a very low probability event, but as most Phones and compute personal electronic devices don't have error correcting memory. This happens at a non-zero rate across the internet So in order to give a demo for this talk I actually want to register one of these domains and so I actually searched for the one-bit variance of Google analytics comm So part of the reason for looking at Google analytics comm is you want a website that people are gonna visit a lot, but not mistype Google analytics falls in that as though most CDNs So when I searched for domains, this was the only one available So I went ahead and registered it And I put up a server on it and got TLS certificate for it and within 24 hours. I saw two what I would call hits one HTTP one HTTPS This is two One an Android browser and one Android application actually connecting to my server If you actually go and look at the source Website of where this came from the source website says Google analytics comm but something within their phone Caused a bit to flip in such a way that they went to the wrong website now One other thing when I actually so with it with a very popular website This is from what I've seen. This is about what to expect. You'll get a few random hits per day Once I got these heads I actually took this down But I when I turned on this website. I immediately was getting spammed by this one very particular no main name And so I went to it and then it was actually a It was a Swedish diving companies website had somehow managed to Mess up their Google analytics tag to point to my website. So every person that was visiting their website was also been connecting to this That's a picture of the code at the bottom there. It's a little bit hard to read from a distance But it's pointing to eagle analytics comm Yeah So That's just the example of how to do this. There's a If you're looking for more information on this, there's actually a really great white paper that Cisco put out And I believe it's actually on defconn.org if you search for it Or sorry, no, that's something later. I'm getting these confused There was a talk at DC-19 on this. I believe that's what it was Yeah and so Being familiar with bitsquadding I wanted to look at Or no, so yeah, there was a Cisco white paper. Yeah, that was this so I Wanted to look at some other variants of bitsquadding And one thing that I noticed is that there are have been some recent registered Top-level domain names or TLDs that actually bitsquad on other TLDs so in this case the registrar Requested.got.got which bitsquads.gov I reported that one in April of 2016 to the registrar and they claim to have implemented a fix It's not actually testable right now because you can't actually register those domains, but That's a very tractable problem because there's a limited number of.gov websites However a recent one which is.bom actually bitsquads.all of.com which is a much more Intractable problem as there are a large number of.com domain names A funny thing about.bom is that it was actually objected to by Verisign because it looks too similar to.com And that objection was overruled, but it also bitsquads.com which is very unfortunate I know it didn't report that one as I Not sure what value there would be from reporting that But yeah, so it's this is actually something that was mentioned in the Cisco white paper, but these are More recent TLDs from that point So gain of so Bitsquad and can only like I said before it only really targets random end users you can't really target a Pacific user with it but you could target a user base such as a Bank if you could only steal a few credentials per day that could still be a lot of money or if you're bitsquad.gov website that could You know get you a few inbound emails to Intelligence agency per day which could be significant So going to the next slide So I'm getting a bit more serious So a lot of the reason why I gave this talk actually is I want to look at IDN homoglyphs A homoglyph are homoglyphs are two characters that look the same But have different meanings And IDN is an internationalized domain name, which is something that if you're an English language speaker you might not come across very often It's and depending on your browser you'll either see the Their source Pini code, which is Xn dash dash Or the ASCII equivalent of that Pini code is a way to encode a encode a larger character set into a smaller space and the format that's used you have the Xn dash dash identifier This is all standardized followed by the Essentially English alphabet letters or standard letters followed by a dash followed by a There's a standardized algorithm for actually computing what comes after that There's a RFC on and everything I'm Yeah, I'm not gonna give a great estimation of that so I won't And so with that you can end up with a so the same machine will determine which letter and what location it will get placed into the URL that gets rendered And I so I gave two examples here These are actually both registered domain names Google.com where both the O's are Cyrillic O's and Time I believe there's Cyrillic O's. It might be Latin. Sorry, and Time.com where every letter is actually from the Cyrillic alphabet And I'll show you how I found those later. Those are both actually registered domain names So I wanted to conduct a survey of existing homo glyphs And IDN domain names against popular.com domain names I wanted to look for people who are impersonated.com domain names I chose to focus on comm as that's where most of the popular English language websites are and I only speak English So I'm very limited there. So there's really three options if you want to gather a large number of domain names One is you could get the zone files for .com that requires filling out form, which Really isn't that fun and some of the some of the zone files are actually impossible to acquire Such as North Korea's who actually leaked I believe last year You go to certificate transparency, which is a log of certificates Or you could go to a third party and their third parties You could pay a few hundred dollars and get a list of domain names, but that isn't that fun either So I chose to go with certificate transparency And for this I pulled down the Google pilot log This contains about a hundred million certificates That's ninety six point five million Domain names roughly. This is also completely searchable if you go to start dot sh Certificate transparency is a if you haven't heard of it before is an initiative to Log every publicly trusted certificate on the internet It's largely like Google initiative But it is There are logs hosted by other companies. I chose the Google pilot log simply as it's the largest right now So There's another reason that I wanted to use for transparency if certificate was if a certificate was registered That means it's far more likely to have been used There are especially with the BitSquadding and typosquadding domain names a large number of these are registered But never actually utilized which I found interesting so In order to use the CT log, I built a pipeline. I pulled down the log I pulled out the common domain and subjects alternative domain fields from that I then filtered out all of the Pini code domain names Then I found so there's a great Python library called you need to code package Where somebody actually went through and said what each Unicode character it looks like it's asking or what? Yeah, what it's asking equivalent looks like and actually went through character by character and did that which is a great amount of work it's very impressive and so that maps Ascii letters to their English or what looks like their English equivalent I then cross-referenced that list with the Alexa top 1 million domain names and at the end of the day I got 1900 CT certificates that impersonated The top one elixir top 1 million domain names. I then also went ahead and modified the Chromium unit test To check whether or not they would actually be rendered as Pini code or whether you end user would see xn-dash In a chrome in Firefox right now that test is actually standardized across all languages So this would be applicable to all end users in other browsers. I believe Internet Explorer Or Microsoft will take into account your local language of whether or not to determine whether or not you actually render the Pini code variant So I've actually posted that list online There are a few false positives in there like there are valid websites. They just simply happen to Look like their English equivalent website But I chose to include those as I didn't want to actually filter all these so What is much more interesting is actually showing the results So these are all real domain names. I would encourage you not to visit any of these There are some really bad ones in here in that list unfortunately, and I spent a decent amount of time actually reporting fishing websites to the Google fishing page Yeah, so some of the work these are just some of the worst ones that I saw grouped by What character they used? So there's a Latin small letter K and the L with a stroke looks pretty bad and then The adult was I was one of my personal favorites. I didn't know that was actually a thing before this and so that's actually part of the reason why I took this approach was I'm not a Ascii expert if there isn't even is such a thing. I really tried to go about Finding the worst abusers without having to necessarily have a knowledge of Ascii in the first place or knowing which letters to look for and Yeah, kind of as expected the most popular websites for the top towards the top of the list were much more likely to actually be Improvinated So one of the more interesting things that I saw Which actually bypassed the chromium check so the formatting here is the last The last entry in that field is whether or not a pie past the chromium check in this case They'd actually be rendered as you know time comm So they only used Cyrillic characters to represent The English equivalent So a breakdown of the unicode blocks that I observed while going through this So land and Cyrillic I kind of as expected but there was a long tail And I really want to call out one at the very end, which is the Canadian so it Canadian aboriginal celibate Unicode block that one really struck me as odd as it wasn't really familiar with what that was So I went ahead and I looked at that unicode block and the characters that are contained in it So I went ahead and made a bunch of sample domain names and I plugged them into the firefox and chrome Browser and they weren't converted to their puny code equivalent None of these were actually registered I Simply tested whether or not they bypassed the firefox and chromium check and then went ahead and report security bugs and The end result of this was that chromium actually Will now not allow the mixture of Canadian aboriginal celibate characters and English characters What they did before and they're actually working on a much more comprehensive anti-fishing fix which I believe is still a work of progress and Within firefox this check is a bit more interesting because Canadian aboriginal celibate actually fell under this thing called Aspirational unicode blocks which allowed them to be mixed as part of the standard firefox profile with their English language characters We think of language characters, sorry and In order to resolve this issue. They simply increased the level of that check to disallow mixture of English characters and this aspirational character block and I believe they actually File the request with some I'm not exactly sure what agency tracks this but some external requests to actually update the Rules regarding this aspirational block as some of the characters do look like English characters so As was news to me and the firefox and chromium engineers who actually were working on the fixes for these You can't actually register any of these domain names And this is where believe it or not policy comes to the rescue when you try to register any of these domains you're stopped and the The reason for this is that varus sign actually has a policy that You can't mix Canadian aboriginal Silver characters with certain other characters which includes English language characters and they have this for every unicode block which I was completely unfamiliar with One of the more interesting things about this is when I actually tried to register these domain names Because this is a failure at such a late stage in the register process Most registrars actually took my money and in one case I actually had to fight to get it back wasn't the biggest fan of that So kind of walk through three different examples of different ways that You're gonna you you were your computer could connect to the wrong domain name So now we're just gonna look at personal mitigations I'm kind of preaching to the choir here So the reason to use a password manager is simply because your computer won't be confused by any of this However, if you actually own or responsible for any popular domain name It's probably worth looking into whether or not somebody's impersonating you You could actually file an ICANN complaint and issue a takedown request, especially if your train marks being violated and if your work in IT or are at our all from Curious I would be something that's worth checking is what happens when you plug these into your Email clients what renders? Yeah, that's probably going to depend on the email client so as I had a super transparency log. I went ahead and Just had some fun with it. I graphed the key types over time RSA 2k keys dominate and followed by ecdsa p2 to 6 keys and then RSA 4k keys There was also this fun long tail which proves that when you give users options you Think you can always trust them to keep be consistent Some of my favorite ones in there is somebody actually issued an RSA 500 key And there's also an RSA 2600 key as well And at the very end there there's a dsa 512 key which is really unfortunate So we actually have a Intern this year within my company who's actually looking more into this and I'll if you're at all interested in this. I'll publish a link to it on my Twitter feed With much more details on what's actually contained the certificate transparency log when you look at the log as a whole so Yeah questions Questions every new anyone yeah, if you have any question feel free to have the stage and feel free to come to Mike feel free We had a question. I might have missed something in the beginning, but I think you showed examples of Domains that were registered with those Mix mixture of English and non English characters. Yes, and then later you said that they're that's not allowed Is that just something that changed or so it depends on which? Block you're looking for so the reason why Canadian Aboriginal Sylvics actually appeared in this list I actually went back later and looked at that domain name and it was two Canadian Aboriginal characters Not mixed with any English characters But it's simply when register when God when it was passed through the Python package looked like a I believe it was like a like a BB character the characters looked like bees and so it Looked like BB.com like as the English language a couple in So that's why it actually appeared on this list, but you can actually register the mixture Yeah, okay, thanks So regarding the don't-click links suggestion. Yes, do you have any recommendations for like any any other mitigation? like for example say there's like a 40 character hex thingy at the end of a URL I don't think most users are gonna like type in everything. Oh oops. I mistyped the 23rd letter or something They're just gonna copy and paste it. So it's there like another way It's a good question You know, it's really a question of trust. I feel like You know, do you trust the recipient? Can you Somehow verify that the link is valid in other ways You know or just couldn't face the end of it I'm gonna type in the start of it It's an intractable problem at the end of the day, right? Okay. Thanks. I once again shammer up everyone shammer