 Maybe we can have a vote afterwards to see if it's actually worse than you think. And for this audience, I have included a footnote that may be particularly applicable. So, what have we been doing at Indiana University? We, since 2014, have looked at the comprehensive, we're trying to build a comprehensive data set. We looked at Akamai's top million websites and we went to Planet Lab and hit them every day. And then we hit fish tank every hour. And for each fishing site, we tried to connect with HTTPS. And then we looked at all the banks as defined by the FDIC. We did that twice. The FDIC, we worked with Richard Sullivan, who is a wonderful, wonderful colleague. And we, and so I want to report to you some of the things that we've seen. And I have got to say, this is not the cheeriest presentation you'll see today. So, oh, this is terrible to look at. Well, okay, so the little blue one, that's Shaw One, that's the good news. The disappearing grayish one is MD5, so that's disappearing. Yay. And, but the other thing that you can't see because the resolution is pretty bad, the slides will be available, I'll just put them up also for download, is that 256 goes up and then decreases. So there's this wonderful orange peak that gives us all reason for optimism. And then it just goes away. And I'm not sure about what's happening there. So last seen in 2013 is I've seen 2013 put out in numerous places. So we're looking at the top million websites and fishing sites. We are not doing a all IPV4. You know, we don't have the kind of data that the organizations have. This is kind of a two grad students in a closet. But our last observation was in 2015. And because you can't see that in the previous graph, I've done various levels of focus here. So that's what a data point of one looks like here. 2015, which really surprised me when that popped up. Now, what about version? Version 3 is so nice, it allows, you know, key usage constraints as a standard. It has some wonderful extensions. And it does seem to be mostly adopted, but not until 2015. We saw a significant number of version 1 certificates until well into 2015. So that surprised me quite a bit. And how about fishing with TLS? We've seen increased TLS fishing, but it's still very small. And TLS abuse dominates TLS issuance. That is to say that the big, as far as I can tell, the big competition for botnets is cloud providers in terms of fishing hosting. The differences are there are different CAs represented, different certificate authorities dominate fishing as opposed to, you know, it's not the big three. The difference is date of issuance and date seen. So for example, you wouldn't, we would not expect to see a small city provider of bus tickets in the top million websites. It would show up on fish tank. So there was a big lag between when it was issued. Lack of extensions and there were about 20 other features. But with that, you can distinguish them fairly well even with very simple approaches like random forest, which is just a bunch of trees and average probability. So I want to show you fishing trends from two or three different points of use. So we hit fish tank every hour from the 12, you know, this goes through September. And these are pretty small numbers. So that's good and bad. So the good part about it is it means that if you have TLS, you're probably not a fishing site. The bad part of it is social engineers aren't bothering. Right? So there are two ways to look at this particular number. And if you want to look at fishing trends as a percentage, so this is the percentage of websites that were validated as fishing on a fish tank that responded to a request for a certificate either through hitting the port or a, you know, just a normal web request. So we were pretty fast and we could, we, I don't know if this is good or bad. I've always thought I was a little happy about this. We never had a problem that they were taken down before we could get to them. So fishing trends remain a tiny, tiny percentage overall they're increasing. This does not include cloud providers, by the way. So I pull cloud provided fishing out, TLS fishing out, because I think that's a different kind of challenge. And here it is as a percentage. But you can see that that big drop off in July is a result of a massive increase in validated fishing sites reported. So, and then we looked at banks. What's a bank? We looked at the FDIC list of federal depository of insured institutions. There still are mutual savings and loans. There are different kinds of institutions. Every year they have to fill out a form and that form includes their domain name. Now often the people who fill out that form put in their email address or they put in the email address of the company that handles their email like help at gmail.com or something. So the quality of the data is, it may be the definitive list, but it's not the definitive list. So we looked at TLS fishing, fishing itself is very small compared to overall payment fraud. So there's this joke that, oh no I can't, Hal Varian tells this joke about a bank robber that says, well why do you rob banks? That's where the money is, but that's not actually where the money is. You get more money if you hijack the correct tractor trailer than you do if you rob a bank. And so similarly there are many victims, many vulnerable victims, particularly older people who have a lot of wealth but not much ability to earn future funds. So it's an important issue, but if you're looking at the numbers, the numbers overall are small. So certificate sharing only in banks is egregious. It's not that hard to get your own certificate. Here are three examples, 51 different banks. It is bad, but it's better than average. So it's bad, but it's not terrible. And what happens when there is an emergency? Thank you, this is a cool icon. So these are people who went and they saw that Heartbleed was a problem. They got that passwords were important and that they needed to change their certificate. So they took their certificate and they got their CA to sign it again. They kept the same keys. So, you know, they meant so well. They really tried. So you can see there's a significant percentage of these people that simply, like at 10% right after Heartbleed, that just didn't change their keys. So you can see people patched and they understand patching. They're like, all right, we need to install a patch. We've got that done. But if you look at the early replacement of non-expired certificates, only this summer did we see more than 90% of the certificates that were active during Heartbleed replaced. And that is predominantly because their lifetimes expired. People did not replace certificates at anything at the level you would expect. For those changed in response to Heartbleed, I'm going to give more details on this. Most of them were upgrades, meaning they increased key lengths or they used an improved hash algorithm. Some of them were down grains. So these are lifetime decreased and lifetime increased. So the big thing we saw when people replaced certificates for Heartbleed was this. I am not going to do this again for a while. So the changes in the certificate algorithm, what we did was we mostly looked at key size change. There were some down grains. And if you want to look at sheer number of changes, the upper line is Heartbleed and there is a bump, but there's not as large a bump as you'd want. So this is the, if you look at the left axis, there's a tiny number. This is for updated non-expired certificates that were renewed at least 45 days before they expired. Because, you know, you don't renew your certificate the same day. It's not like, oh, oh, I better get a new certificate. Ah, you know, hopefully you plan and you have an infrastructure. So the little sad gray hats on these bars are truly sad gray hats. There are people who went from SHA-1 to MD-5, like, we're bad and we could be worse. The little red hats are people who went from SHA-256 to SHA-1. Why would you do that? I mean, why would people do that? I mean, it's not that many. But it's, and this is top million websites. These are websites that the human beings use. I don't know. So if you want to be truly sad, oh, I'm talking too fast, what other domains do we see? We are also looking at certificates in the Internet of Things. And so we looked at Siemens Smart Things and Mother Sensei, which, and this is not the opinion of the U.S. government, Indiana University, real-world crypto, or any other associated entity, is the creepiest thing I have ever seen. I mean, if you do not believe me, you should Google this. So here is the Smart Things Siemens Certificate. They did version three, which I think we should respect. So that's a nice thing, I have to say, about this. The signature algorithm, SHA-1, valid from January 2015 to 2025, and the public key is over 1,000 bits, right? So I was very surprised to see this, and I don't understand why they did it. And, but if you compare that to the app requirements, Google Play requires a lifetime of 25 years. It does not require revocation information. So, you know, Happy New Year, this is one year, this is 25 years of New Year's. It is 25 Twilight Zone Marathons. 25 years ago, Silence of the Lambs was the best movie. The iPhone has only been around for 10 years, and it was 23 years ago that the word spam was first used on Newsnet. So this is a 25-year lifetime. All I have to say is, okay, Google. And so, and now we're talking about mother sense. Their tagline is mother knows everything. Speaking as an actual mother, I would like to say I do not know everything and I do not want to know everything. All right. So the basic infrastructure is that you have this little thing that looks like a penguin, and I don't think that's a statement on the body shape of mothers. I'd like to say that. And all sensor data that is received by the mother from what they call cookies is sent to the cloud. And every mother is connected to every cookie. And so, if you had one of these in your kitchen and then your upstairs neighbor had one in their bathroom and you put one of their USB hubs on, say, your kid's toothbrush, it would go through their device. So I think the reason they justify this is they do have a very good certificate. They do a very careful TLS handshake. It was a nice piece of work to watch the session key. And the authenticated server go from the home to the cloud. And then they open a completely new web socket with no encryption at all and send all the data through the new, like, why? So if you have any questions about whether or not your neighbor's kid brushes their teeth, you'll have that. So I'm going to, I talk too fast. I left too much time for questions. I'm sorry. This is my testable predictions that I am putting on film that I think we are going to see a standard traditional web server still using SHA-1 in 2020, given how hard it was to get rid of MD5. That they, by definition, will remain until 2025, but Siemens, unless, is there anybody here from Siemens? And I think the last observation is not going to be until 2030 on phones. That wildcards are going to continue until they're made completely unusable and non-inoperable. And I have a bunch of related publications on this, and my co-author is in the audience, and my students all have jobs. Yay. So this is me, postdoctoral fellow. I have two students looking for internships, and I would just like to close by saying IU is, I know that many of you flew over Indiana to come here. We are, if you look at CS rankings, which we did not hack and are not affiliated with, they ranked us strictly in terms of our publications as the top, you know, in the top 10, and I am trying to put together, I am putting together a workshop on PKI and IoT, and the goal, and one of my goals in submitting this talk is to recruit people who have excellence in crypto and applying crypto and bring them together with, it will be in the Seattle area, people who are in NIST and DHS, and also people who are in the VC community to try to come up. I mean, we have the infrastructure. We don't have the interfaces, like the previous speaker was saying, the developers don't know what to do, and we don't have the incentive structure. So you need all of those things. We built an infrastructure where some organizations have interfaces, and I, but we don't have the incentives down at all, even within many companies. I mean, not just globally, but obviously, they did not do code checking for the hideously named mother hub. I've been looking at that thing all semester. Give me a break. And the thing, and if we don't have those things, the internet of things is going to continue to be atrocious. So here is my now not so secret agenda. Please do email me or give me your card if you still use paper. So if you're interested in participating in this workshop, which we're planning for the around August. So thank you. So we do have time for a few questions. I see Adam at the mic, Adam. Hi. So one of the assumptions that I think was in your talk was that fishing sites should not be able to get valid certificates, but at least we Chrome and probably other browsers don't actually believe that's true. We don't think CAs are very good places to determine fishing or not. And they can't act fast enough. So I'd be quite all right if a fishing site had a certificate. We depend on safe browsing and other technologies like that to clamp down on them. So if the goal of the public key infrastructure is to authenticate a remote web server to the person engaging with that server, if it is indeed in any way an identity infrastructure, the fact that that PKI is almost orthogonal to fishing means that it has not met the goals of meaningful authentication and identification. And then that's a really harsh thing to say. So if if the fishing site has a cert for PayPal dot com, then I agree entirely. That's a failure. But if it has no, I don't know, something other than PayPal dot com, but is like pretending to be PayPal dot com, we are not looking for the PKI to determine that that other thing is sufficiently similar to PayPal dot com that it shouldn't exist. We have other technologies to do that that act faster. I don't think it should be in the sole role. But if you look at all your user education, it says, look for the lock in the bar, look for this lock. This is green. This means it's safe. And we interviewed about 400 people and found out what they think it means. And they, mostly they don't know, but after don't know, there's a fairly close split between insanely optimistic, like it protects from AV. It has a good privacy policy. And this is what you present the user. And if it is meaningless and orthogonal to the dominant user threat, then I think that is inadequate. That's correct. But we're going to kill the green lock. That's that's the way we're moving. Okay. Hi, I'm intrigued by the people who downgraded their, sorry, I'm intrigued by the people who downgraded their certificates. And my own guess is that the only reason I can think of people doing that is because they switched certificate providers. Did you explore that at all? Did they? There was very little switching of certificate providers. I mean, we, we've seen a high level of consistency over time and switching certificate providers is, it does tend to be an indicator that is to say a feature that is weighted and valuable in identifying a malicious certificate, but I honestly, I think that the incentives were to get the fastest, cheapest thing. We have a real incentive problem in the infrastructure. And it's like, yes, you can remove like the, the Shaw one fraud recently where there were a bunch of backdated certificates. It's like having a law enforcement system where you only have the death penalty, right, that you can get away with a lot before we have to throw you out of the room. One of the things I want to do with this workshops and interviews and more qualitative work is to try to figure out, and this is a, not the, you know, strict scientific question, what are you thinking? Right. And so in general, we've looked at what people think about certificates themselves, the ones that look for them put incredibly high levels of trust in these things. If that green bar is there, it is approved by the government. It's safe from antivirus. It has a good privacy policy. People believe all of these things. So building a user centered system is difficult. Building a user centered system around system administrators, many of whom have two years of training or were just the kid who took care of the webpage in high school is another order of difficulty. Thanks very much. I'm going to have to, sorry, I'm going to have to intervene there. I'm very sorry to cut off the questions. I see there's a lot of interest in the room, but in the interest of time, we need to move along. So let's thank Gene one more time. I'm afraid you're out of time. Thank you. Sorry, all the data is available on proceed or impact, which is a DHS project. And you can ask Ross Stapleton Gray and he will send it to you.