 This is lecture 21 of Computer Science 162. So the next two lectures are on security. And so our goals for today are to go over a conceptual understanding of how we make systems secure. And then we're going to go through four key security properties. So the first is authentication. And we're going to talk about data integrity. We'll talk about confidentiality. And then non-repudiation. Then we'll pull it all together and look at cryptographic mechanisms. OK, so let's first start with a basic question. What is computer security today? Put simply, it's computing in the presence of an adversary. So an adversary is what defines the security research field's characteristic. It's what makes security important. Now, it's reliability. If you look at something like reliability, robustness, and fault tolerance, that's all dealing with mischance. That's mother nature. Things randomly go wrong. Or mother nature drops a hurricane on your data center. Security, however, is when we're dealing with an adversary that's very knowledgeable. And they're specifically trying to cause your system harm. They're trying to break in and steal your money or steal your account credentials or something else. So really, it's about surviving malice and not just mischance. So it's not these random events that are occurring like a drive failing, but rather it's an attacker that specifically is trying to exploit a weakness in your system. So wherever we have an adversary, there's going to be a security problem. And pretty much everywhere you're going to have an adversary. Even behind your corporate firewall, that's the insider attack. But most likely where you'll see it is someone breaks into an internet server or internet service, I should say. Now, we can talk about the difference between protection and security. So protection is what gives us a set of mechanisms that we can use to control the access that programs or processes or users have to resources. And we've seen a bunch of these already in the class. So we looked at page tables as a way of controlling the access that processes have to memory. We looked at round robin schedulers as a way of controlling access that programs have to the CPU. And then today we'll look at data encryption as a way of controlling access to data. Now, given protection, we can then have security. We'll use these protection mechanisms and some policy to prevent the misuse of resources. Now, we need a policy because that's how you define misuse. You're using something other than the way it should be used or an adversary. So for example, it might be we want to prevent the exposure of sensitive information. Sometimes there are legal policies that require. So for example, there's HIPAA that requires businesses to protect your health information. There's also regulations covering disclosure of your financial information, disclosure of your social security number or other personally identifiable information. But we can also use security. The policy might be to prevent someone from having access at all to data or the ability to maybe modify it or delete it. Now, you have to look at the environment in which a system operates because even if we have the most well-constructed environment, an environment which we're even willing to spend many, many, many billions of dollars to protect information, that's not going to help if social engineering causes someone to reveal their password. I'm sure everyone here is probably familiar with Edward Snowden and what he's done in terms of releasing potentially tens of thousands of documents from the National Security Agency. And you might wonder, how did a CIS admin gain access to all of these documents, let alone download all of those documents without automated systems or other sorts of systems at the NSA detecting this? Social engineering. He convinced other CIS admins to give him their credentials and gain access to the system under their credentials. So rather than it looking like one person was downloading the documents, it was spread across many people. So simple social engineering, which is kind of interesting because the NSA's response to all of this is, well, we're just going to lay off 80% of our CIS admins and impose a two-person rule. It'll take two people to gain access to any piece of information. But it's Snowden already demonstrated that he can convince someone else to give them access to their credentials, not just the information. So this is really the biggest challenge that you face in designing a system because if you don't design it well, people will just circumvent it. It's amazing the number of people who, when the company says, you have to have encryption of your hard drive, well, they just write their password down on a sticky and put it right on the computer. I've actually seen that. OK, so what are some of the requirements that we have for security? First one, authentication. You want to ensure who a user is claiming to be. Second is data integrity. You want to make sure that data can't be changed from a source to a destination or perhaps when it's being stored on some stable storage. You put it on a USB key or you put it on a server hard drive. You want to make sure that data is not tampered with. Or if it is tampered with, you want to be able to detect that it has been tampered with. Third is confidentiality. So this is ensuring that data is only read by someone who's authorized to access that data. We want to make sure unauthorized users can't see the data. And then finally, non-repudiation. This one's a little bit more complicated. Basically, what it means is that a sender or a client can't claim later on they didn't send or write some data. And similarly, a receiver can't claim, oh, I didn't receive that data. So we're going to go through each one of these through the rest of the lecture. So all of this depends on the field of cryptography. And that's all about securing communications. Cryptography is how we communicate rather than the presence of adversaries. And it's not a new problem. For thousands of years, people have tried to communicate and keep their communications private. If you want to have a nice interesting read about all of this, Simon Singh's book, The Code Book, is a very readable history of cryptosystems and how people have used cryptography over a millennium. But the basic goal here is one of confidentiality. You want to make sure that you encode the information. So even if an adversary gets their hands on that information, they can't extract the original information. But you want to make it easy for your friends who you want to have access to that information to be able to access that information. So with cryptography, we can use public communication channels. We can use the internet. We can use a Wi-Fi and a cafe without having to worry about an eavesdropper being able to gain access to the information that we're sending. The basic idea is we have a key. If you have the key, it's very easy to decode the data. If you don't have the key, it's computationally infeasible for you to gain access to the data. As we'll see, what infeasible is computationally changes over time because computers are constantly getting faster. So we want to design our cryptosystems such that we can provide a given time bound on our data's protection. So maybe you won't be able to decrypt the data without the key today. But maybe in 50 years, you can decrypt it. If it's something that is not that sensitive, 50 years may be more than enough. If it's nuclear weapons secrets, 50 years might not be enough. So you might want to use a different cryptosystem. But the important thing is you have to protect that key. Because with that key, it's trivial to decrypt the data. You also have to pick a key that's hard to guess. If the key is easy to guess, all bets are off. So most recently, a bunch of crypto vendors said you should switch from using the default crypto rather random number generator, cryptographic random number generator that they had in their systems. Because there was a thought it might have been weakened or potentially vulnerable to compromise. And that random number generator is what's used to generate the seeds, which are used to generate a key. OK, so how does all of this work? Well, with symmetric key cryptography, we have the same key that we're going to use both for encryption and for decryption. We take our plain text. We encrypt it with this secret key. Now we have cipher text, which we can send across the internet, not worry if someone intercepts it, that they'll be able to read it. And then at the receiver, we decrypt it with the same shared secret key. And they're able to extract the message back. So we're able to achieve confidentiality with an approach like this. Now, it is, however, vulnerable to tampering. So if there's a man in the middle right here, they can take that message and they can change the bits in the message. Or they can take the same encrypted message and replay it multiple times. So we're going to have to look at how we prevent replay and also how we can prevent this message from being tampered. So in a symmetric key system, a very simple example of a symmetric key system is to just take and use XOR. So I take my key, I take the text, and I XOR each block of text with the key. So this is very easy to implement, but it's vulnerable. And it's one of the early crypto systems that people use, but it's very vulnerable because you can do frequency analysis on letters or words and very easily identify what the key is as a result. But there's an unbreakable alternative. So what is this? This is, and you can't actually see the numbers, but this is a little Cold War code book. So it's a little teeny tiny book that has each one of these little dots on here as a number. And when the secret agent is encoding the message, they use each number once to XOR. And when they're done, they throw, when they're done with all the numbers in the book, they throw it away. This kind of code, completely unbreakable, assuming your numbers are truly random. And that's, again, part of the challenge. But because you're using a different key for every message, you can't do frequency analysis on this approach. So this was a technique that was used in also not just for secret agents, but also for some forms of military communications. All right, so more sophisticated versions of this, when we have computers doing it, are block ciphers. And these typically work on some block size, like 56 or 64 bits. And to encrypt a stream, we just simply encrypt these blocks one at a time, or the blocks can actually be linked together. That's called a chained block cipher. But the basic idea is we take our plain text, break it up into blocks, those blocks go in, the key is also passed in, and out comes blocks of ciphertext. You can transmit those blocks of ciphertext over to our recipient. The recipient then does the block cipher decryption operation, passing in the same secret key, and then out comes the blocks of plain text. Very simple. Most common algorithms, of all of the algorithms out there, the most common one is the data encryption standard. It was developed by IBM in the 1970s, and standardized by the then National Bureau of Standards, which is now known as NIST. I think it's the National Institute of Standards and Technologies or Techniques. It uses a 56-bit key. That was reduced from a 64-bit key at the NSA's request. Conspiracy theorists can put on their tinfoil hats and wonder why. But it still is a very strong algorithm, other than brute forcing. Now the challenge here is custom hardware. If you build a custom ASIC-driven system, you can crack a key in under 24 hours. So it's not that secure. But it's actually still used today in financial institutions, and what they do is a form of DES called triple DES. You basically apply DES three times, each time with a different key. So instead of a 56-bit key, you now have a 168-bit key. And that's what's used for communication, say, between a branch office and a central bank. And for the federal wire, where they move trillions of dollars of money around, is all protected by triple DES, yes. Yeah, so the question is, how would you break this brute force? You'd have to apply DES three times, each time you're doing it with a different key. So brute force would require you to explore a two to the 168-bit key space. Instead of exploring a two to the 56-bit key space. You take your one message, and you decode it, and then you get the output of that, and you try that with a different key again, and you try that with a different key again. Because you're trying to determine key one, key two, and key three simultaneously. So what, yeah, ah, very good questions. The question is, if I find a key, how do I know it's the correct key? Well, you have to know something about the plain text. So if I know something about the structure of the message, or if I can do an injected plain text attack, where I give you something that I know you're gonna then turn around and transmit, then I can try and see, when I decrypt it, do I see that plain text that I've inserted? Or if I know something about the structure of the message, I can look and see if the message I get out looks like it has that structure. If I get out garbage, I know that I probably did not have a successful decryption. But this is an example of injected plain text would be during World War II. We did that. To see if we truly broken the Japanese crypto systems, we would transmit in the clear false messages about we're gonna attack a particular base. And then we'd look to see on their encrypted traffic, did we see that same word appear? And if we did, we knew our crypto system, we had broken their crypto system correctly. Yeah. The question is, would it be 24 cubed or much greater than that? It's two to the 168 bit that you're exploring. So it's gonna be much, much larger instead of two to the 56 that you explore. So the question is, could we have 168 bit DES? Well, because of the way the DES algorithm works, it would be hard to extend it to be larger numbers of bits. So the simple alternative, yeah. But to extend it further would be much harder. And a lot of it's implemented in hardware. And so you'd have to change the hardware. It's very easy to just take the same message and run it through DES with the same, with different keys three times. And it is to go and redesign all of the hardware out there that's implementing DES. A lot of this, you know, sort of a bank this lives in an IBM secure co-processor card that sits in their server. Unclear. So it's not clear whether triple DES is less secure than DES 168 would be. This is one of the fundamental problems with all of these crypto systems is there are no formal proofs for how hard it is to, or how secure these systems are. There's a lot of analysis where we have intuitions about the level of difficulty, but there is no proof that tells you this is how hard it is short of brute force and brute force exploration of, on average, half of the key space. Now, because of the weaknesses that were recognized in DES, in 2002, there was a search for a replacement and it was actually a worldwide competition for a new algorithm. And the result was a Dutch algorithm, elliptic curve computation was chosen as the advanced encryption standard and standardized back in 2002. AES supports multiple key sizes and modern processors like modern Intel processors actually have hardware instructions for some of the primitives of AES to make it much, much faster to implement it. And getting back to the question earlier, how fundamentally strong are these? No one knows since there are no formal proofs of how hard it is. We know how hard brute force is, but other than that, there may be weaknesses in these algorithms which are exploitable. So again, you can put on your tinfoil hat and look at the changes that the various government agencies have made, the various three-letter agencies have made in the standard before NIST finalized it and that actually is one of the reasons why several of the cryptosystem manufacturers have said stop using the base random number generator. Was there a question? Yeah, so some of it is we don't know if P equals NP, some of it is we don't know when we talk about prime numbers, base systems, we don't know how hard it is to factor prime numbers. Someone might come up with a new technique in which case that would change everything or quantum computers. It depends on the specific system. One of the problems is that these are not, especially like something like DES, it's not just a simple algorithm. There are multiple stages to the algorithm, so you'd have to apply that to each stage of the algorithm and determine whether it applied. A lot of, there are a lot of PhD crypto analysts who do research in this area and a lot of effort was spent to try and understand the strength of ECC in AES and they thought it was the strongest of all the contenders and the most easiest to implement but that doesn't really provide any strong here. As we'll see when we get to hashing, algorithms that we thought were cryptographically strong, quite quickly actually we discovered were quite flawed. Okay, so we can use a secret key for authentication. Again, this is a shared secret between two participants and A can authenticate itself to B by taking and creating a nonce. A nonce is a random number. We use it to avoid replay attacks and it will encrypt that nonce with a shared secret key. Now when B receives that, it can decrypt it, it'll see the nonce and it can send it back to A. So what do we know now? A knows it's talking to B because only B could have correctly decrypted the message that was sent. However, this is vulnerable to man in the middle attacks. Like a C could be sitting in the middle, could see that encrypted message that comes across, pass it off to B, B responds back with X and then it responds back to A with X. And now A thinks it's talking to B and in reality it's talking to C. So there's some more work we need to do to make this more secure against man in the middle of it. So that's authentication. Next is integrity. So integrity, the basic building block that we're gonna use for integrity is cryptographic hashing. So we're gonna take some text and we're gonna associate a hash with that text. We can send the text to the receiver, send the hash to the receiver. The receiver can then compute the hash locally and verify that it matches the hash that was sent. And this will allow us to verify that the data hasn't been modified with either maliciously or accidentally. So basic approach. The sender computes a secure digest in the message using a hash function and the secret key. What they do is this hash function is publicly known hash function and we'll look at a couple of examples. They compute a digest by taking the message and pre-pending onto the message the secret key. They then hash that. They take that hash value and they pre-pend the key onto the hash and they hash that. That gives us a hash-based message authentication function. Now why do we need to include the key? This is basically because without it you could do substitution attacks against the hash function and we wanna protect against that. So that's why we include the key. So now we send both this digest D and the original message M to the receiver. The receiver uses that shared key to compute the HMAC of the key and the message and verify that it matches the D that was sent. They match, we know the message hasn't been tampered with. They disagree, something happened to the message. Either intentionally or... So let's see a picture of this, it's a little confusing. If we take our plain text, we compute our HMAC of that plain text using our secret key. We can send that encrypted digest value over. We then send our unencrypted message over. At the receiver, we apply the HMAC function with our secret key. We compare that output against the digest value that was sent. They don't match, the message was corrupted. They do match, the message was not corrupted. Now, we can encrypt the message for confidentiality. So this protects us against integrity. It provides integrity, but it does not provide confidentiality unless we also encrypt the message. So that's a very good question. Why can't the adversary just send the same message in the digest together? So again, this is why we would need to also add something like a nonce to protect against a replay attack. A nonce could also be something like a timestamp. So a message only has a limited lifespan, and that's an easy way. Otherwise, you have to kind of keep track of what nonces you've seen before. Okay, so what are some of the cryptographic functions out there? Well, one of the first widely used functions was the message digest version five. Developed by Ron Rivest at MIT back in 1991, and it produces a 128-bit hash. So it takes your big message and converts it into a 128-bit number, which can represent to the 128-bit space. Very, very, very widely used. Anything out there that uses crypto supports MD5, which is not a good thing. Because in 1996, people identified attacks that could cause collisions. Now, what's a collision? A collision is a different plain text, or a different source text, rather, that when you apply MD5 to it, gives you the same hash value. Now these collisions, the initial collisions they found was just that we could cause collisions to occur, but you'd end up with a gibberish message that generated the same hash. So it's not very useful from an attack standpoint. It worries you because it means that there's the risk of your data could be severely corrupted and you wouldn't be able to detect it. But by 2008, people had gotten really sophisticated. And one of the recent versions of these particularly nasty pieces of malware, I think it was either Flame or Dooku, actually relied on a forced collision that allowed it to impersonate a certificate from a Windows Update server. And that was a level of sophistication no one had ever seen. Clearly, a lot of really bright crypto analysts worked on how to make that happen. This was the first time anyone had ever seen an in the wild dynamic attack that would generate a forced collision that was very specific. In any case, you shouldn't be using MD5 anymore and Microsoft has since deprecated it and stopped using it, but it takes people a while to update their legacy servers. And so there are probably still people out there using MD5 even though they shouldn't. So the government, yes, question. Right, so the thing is given a text and an MD5, can you find another text that produces the same MD5? And what they were able to show pretty early on was theoretical ways of doing that. Then they were actually able to show you could produce a different text that produced the same MD5. But that's not very useful for an attack. Like for example, if I have a transfer $200 from your bank account to my bank account and that's secured with an MD5 to say, this can't be tampered with, I'd like to change that to $200,000. And trying to force that collision is much, much harder. But now we've seen actual cases where people have forced collisions and been able to modify a document in a way that gives them still the same MD5. Yeah, so that's a good question and I need to clarify this. So the problem here is, of course you're gonna have collisions, right? Two to the 128 is a big number, but it's not infinite. When we look at all the possible texts that could be generated, so of course there are gonna be collisions. The question is can you find a collision? So given a hash, can you find another document that will have the same hash value? That's where, and then as an attack, can you modify an existing document in a way that you still have the same hash value come out at the end? That's what makes it even more sophisticated. So it was really, over the course of more than a decade that people found sort of theoretical attacks against MD5 and then they found very practical, more, I should say, more practical attacks, but still very hard to execute attacks. And then we saw an attack that was actually very, very successful. Okay, so the government did a basically similar kind of contest to find a new secure hash algorithm which they named Secure Hash Algorithm, version one. Developed in 1995 by the National Security Agency as a successor to MD5. It used 160-bit hashes. So now you've made the space much larger, much, much larger, right, from two to the 128 to two to the 160, and it's very widely, it was very, very widely used in the Secure Socket Layer, SSH, BGP for email and document communication, secure document communication, IPsec, which is a virtual private network tunneling protocol, also used SHA-1. But just a little while after it was developed, a decade later, broken. And then really badly broken to the point where people again could force collisions. And so the government mandated that no government users could use it after 2010. It was replaced by SHA-2 in 2001. And this is a family of functions that generate hashes that are much larger. So two to the 224, up to two to the 512. And this is what we use today. Now, it's important to recognize because you're using multiple hashes that HMAX are still secure even if you're using MD5. Because you'd have to force a double collision because you're doing that inner hash of the message with the key propended. And then you're taking that hash value, propending the key and hashing that again. And it doesn't seem like it's likely that you could force a double collision. Even if you could force the inner function to have a collision, it would be pretty much impractical to try and force a collision on the outer function. Okay, so take away here is AES, use SHA-2. So all of this is great, but the problem here is what if is basically how do I get you the key? So if I wanted to log into my Gmail account over an encrypted connection, do I have to go down to Mountain View and pick up a key in order to do that? That wouldn't be very practical. So this really led to the development of, and you can also imagine it in the context of Spycraft and all of that, this really led to the development of asymmetric encryption. So with asymmetric encryption, we have two keys. With symmetric encryption, we had one key that we used for encryption and decryption. With asymmetric encryption, we have a key that we use for encryption, and we have a separate key that we use for decryption. We have a key pair. Very critical property that this has is that if you know the encryption key, you cannot guess or determine, through any easy mechanism, the decryption key. And vice versa. Given the decryption key, you can't determine what the encryption key is. So we can make ePublic and let everybody know it. We can publish it in the newspaper. I can write it on the blackboard. Everybody can know the public key, and it says nothing about what the private key actually is. So now, if Alice wants to send a message to Bob, she can retrieve Bob's public key, maybe from his homepage or from the newspaper or wherever, and now encrypt the message with it. The message that she encrypts, she can't actually decrypt. She can't read that message that she just encrypted. But no one else can, either, except for Bob, because he has the decryption key. So this is a really powerful concept now, because I can put the key out there. Anybody can send me a message now securely. So how is this used? So the sender uses the receiver's public key, which is advertised to the entire world. They encrypt the message with that public key. They can now send it as ciphertext across the internet. No one can decrypt it, except for the receiver who has this secret private key. They can decrypt it. So public key cryptography was invented in the 1970s and revolutionized cryptography, because now you didn't have to actually have these two parties come together and share a key before they could communicate securely. Now it was publicly invented in the 70s, but it was actually invented much, much earlier by British intelligence. Can you guess what they probably were using it for? But they didn't publish it, so they don't get the claim to fame. Now, the question here is, how do we actually create this key pair that has this property that you can encrypt with one, but not also decrypt with that, and you can decrypt with the other, but not also encrypt with that? Number theory is the answer. So the most successful version of asymmetric public key cryptography is the RSA algorithm, developed by Revest, Shamir, and Edelman back in 1977, based on modular multiplication of very large numbers, used in everything today. You use it thousands of times a day. Every time you connect to Facebook that goes over an HTTPS connection, multiple HTTPS connections. That's using RSA to secure that communication. SSH also uses it. So what are the properties? So the properties are you have to generate very large prime numbers. This is actually really hard, but we have probabilistic algorithms that we can use to generate these numbers. Now, they're probabilistic because what that means is with high probability, the number they generate is actually a prime. If it's not, that's bad. But again, with high probability, you are generating a prime number. You also have to do exponentiation because it's modular arithmetic, and this is exponentiation with very large numbers, but they're very fast algorithms actually for implementing that. Now, overall, because you are doing exponentiation, it's much, much slower than symmetric key cryptography. So typically, what people use as an approach is use public-private key cryptography to set up a secure communication channel because that requires only exchanging a small amount of data. And then you create a session key and you exchange that session key and use that for bulk encryption of all of the data. So RSA, when you connect to Twitter, is used to connect to them. You then exchange a session key and then all the tweets and the pictures you're sending back and forth are encrypted using a symmetric key cipher, which could be AES or something like that. Now, how difficult, given D, or given E rather, is it to recover D? The answer is it's equivalent to having to factor a, find the prime factors of a large number, which doesn't say a lot. This is hard, we know it's very hard. Right now, the algorithms we have are basically brute force. But quantum computers could do this in polynomial. So that does raise kind of a question when there's a lot of investment going on in quantum computers, when they become popular, it may be the case that, and feasible, may be the case that we either have to go to much, much larger keys or find something else other than RSA. Or again, if some mathematician has a breakthrough and figures out how to factor large numbers, again, we'd have to change our cryptosystems. And of course, someone may already have figured that out. Okay, so how does this work? So each side only needs to know the other side's public key. There's no secret key that has to be shared initially. So A is going to encrypt a nonce with their public key. I'm sorry, with B's public key, rather. Now B receives that message, decrypts the message, and sends back a message containing X, Y, and B, where Y is a nonce that B is generated. All of that's encrypted with A's public key. So at this step, what do we know? Who knows what? What does B know? At this point, he doesn't know anything. He could have gotten this message from anybody, maybe not A. What does A know? That's right. A knows that it's talking to B, that B was able to decrypt that message and return back the nonce. So the reason why we have to include Y is because B doesn't know it was actually talking to A. And so what we have to send back is the encrypted A and Y with B's public key. Now A knows it's talking to B, and B knows it's talking to A. To a man in the middle, this is just encrypted traffic going back and forth. There's no way they can read the traffic. Now, of course, there are a lot more details to make this work in practice. This is the simplistic version of it. There's a lot, lot more that we have to actually do to make this work. In fact, even in practice, this can be very hard to make work. So in 161, in the computer security class, one of the early assignments that the students often have is create a public-private key pair, write a message, encrypt that message, and send it to your TA along with your public key. And it's amazing how many students create the message, encrypt it, and send it to the TA along with their private key. Completely defeats the purpose, because now the TA can impersonate the student. Okay, and this is actually one of the reasons, why are systems insecure? Because security is really, really hard. How many people use a passcode or password on their phone? You know, I'm kind of surprised. I don't understand why. You have all sorts of important information on your phone, even more valuable information potentially, than on your laptop or your home computer. My own grad students were security students. I asked them this question, and they're like, oh yeah, and I don't use a passcode. Do you have your SSH keys on your phone? Oh yeah. So this is the problem with security. If you don't use it, because it's awkward or it's a pain in the neck, then there's no hope. And this is why we keep reading about break-ins and losses of information. In fact, today there was just a thing about Cupid Media. I think it's one of these online dating sites or something like that, had a massive leak of their password database. And unlike all these other massive leaks that have happened with LinkedIn and others, and Adobe, they didn't actually obfuscate the passwords at all. So they have all the logins and passwords right there. We'll talk more about that in just a moment. Before we do, however, we have a quiz. So four questions to think about. First question, integrity requires the sender to encrypt the message. Second question, you're supposed to think about it first. I'll come back to it after the break. Second question, asymmetric key cryptography is much slower than symmetric key cryptography. Third question, encrypting a nonce, a random number avoids replay attacks. And last question is confidentiality, guarantees data integrity. So think about these. In the meantime, some administrative stuff. So project four design due date is changed. It is now Monday, the Monday after the Thanksgiving weekend. So I'd say start early on it instead of Tuesday. And your code for project three is due tomorrow before midnight. So with that, we will take a break. And while we take the break, you can enjoy this little XKCD comment. Let's get started again. So one thing I do find interesting about this is I'm not sure it's really 44 bits of entropy because there actually aren't that many common words. So if you look at like an English dictionary, it's about 40 to 50,000 words. So if you only had to look at 40 to 50,000, choose chosen four times, it's not clear to me that that would be actually as hard as 44 bits of... Okay, so let's go through our questions. So first question, integrity requires the sender to encrypt the message, true or false? It's false, right? Integrity requires us to use a hash. Encryption won't protect someone from flipping bits. We'll get garbage out at the other end, but if it's a program that's reading the data, it might not realize that it's garbage. Second question, asymmetric key cryptography is much slower than symmetric key cryptography. Yeah, that's true, right? Because of the exponentiation operations we have to do, it's gonna be slower than the operations that we do with symmetric. And also, symmetric in some cases actually encoded right into the hardware, the instructions, and that's gonna make it run even faster than a software library that we're using. Third question, encrypting a nonce, random number avoids replay attacks. So yes, that is true. If you're keeping track of these nonces, yes, you can use it as a technique to prevent a sender from replaying a value or an adversary from replaying an earlier message. Fourth question, confidentiality guarantees data integrity. That's going to be false, right? And for the same reasons basically as question one, right? Using encryption isn't going to guarantee the data's not tampered with. You have to use some kind of cryptographically secure hash function. Okay, so the next security topic is non-repudiation. We're gonna combine a couple of the techniques we've seen RSA cryptography and signatures. So let's say Alice has published her public key and she wants to prove who she is. She can send a message encrypted with that public key, or I'm sorry, with her private key. Now anyone who sees that message can then take Alice's public key and retrieve back. So this gives us a signature. Only Alice can generate this message. Everyone can read the message. So this is the opposite of what we were doing before, which was saying anyone could send Alice a message using her public key, but only Alice could read that message using her private key. In this case, Alice can create a message that we know only Alice could have created it, but she's gonna use her secret key to create it. So Alice now can't deny that she sent that message because only she could have created the message that contains X. So this gives us non-repudiation. So now we can use this with something like this. This is again just the basic details, a lot more required to make this secure in practice. Alice can create a message saying I will pay Bob $500. She can encrypt it, sign it with her private key. It's a ciphertext. Bob can now take that message and decrypt it using Alice's public key and get back this message. So Bob can now take this ciphertext to the bank and say Alice said she'd give me $500, so transfer $500 from her account. Only Alice could have produced this message. Now, of course, we need integrity checks, we need nonces, we need a lot more things to make this work in practice, but this is the basic idea of non-repudiation. Yes, question? Yeah, so Alice, absolutely. So Alice would not wanna send this message more than once. Bob would wanna take that message to the bank more than once. So we need to include some checks that allow us to detect that Bob is trying to reuse this message. This is one of the challenges with electronic cash, is how you keep someone, it's bits. So how do you keep someone from redeeming the bits multiple times? Okay, so we can pull all of this together and ask a sort of very basic question. How do I get Alice's public key? This is how, this is rather where digital certificates come into play. So there are many different ways that we could get Alice's key. And one way that people proposed was, you go to the webpage, but that's vulnerable to tampering. I could, a hacker could break into your webpage where I could attach it to my email message, but again, an attacker could replace that or I could publish it in the newspaper. And we could have books of, you know, go through the back issues of the newspaper and there's all the keys. But that in practice doesn't work. You need an electronic form. And so that led to the creation of trusted authorities, certificate authorities, CAs for short, that sign a binding between Alice and her public key using their private key. So an example of one of these signatories is Verisign. There are hundreds of signatories, like 6, 700 of them. And they will give you a digital certificate. This digital certificate binds a name and a public key signed with the private key. So now Alice can just hand out her digital certificate. Now, anyone who has the trusted authorities public key can then extract Alice's public key from that certificate. Because all they have to do is take the certificate and decrypt it using the public key of the trusted authority. And they'll be decrypting this, you know, encrypted value which will then give them back just Alice and her public key. So this is our basic, our last sort of basic building goal. So what do we have now? What we have now is the following. If we can securely distribute a key between two parties, we can use a symmetric cipher, like the advanced encryption standard. And that gives us fast and, to our best of our knowledge, strong confidentiality. Public key cryptography is what we can use when we can't distribute a key easily. But it does have this problem of, we still have to figure out how we're gonna get the public key out there. And so, and it's also not as computationally efficient. So typically what we do again is we set up a communication channel using public key cryptography. Then we exchange a session key, a secret key, which we can then use for a symmetric cipher. Digital signatures are a way that we can bind a public key and an entity. So let's pull all of this together and look at how we use it in practice. We use it in practice is, it's the holiday season. So we go to Amazon. So when you type into your browser window, HTTBS colon slash slash www.amazon.com, what happens? Well first of all, your browser looks at this URL, this uniform resource locator, and it sees the protocol field. This first field is HTTBS. That means use the hypertext transport protocol over the secure socket layer or transport layer security. And TLS is the successor to SSL. This provides us with a security layer that gives us authentication and encryption. On top of TCP, which we looked at earlier in the semester. Completely transparent to the application. You can actually use SSL or TLS with lots of applications without them even having to know. Just a plug-in library that you use. Now how does this actually work? So first thing that happens is your browser connects to Amazon's web server. So we open a TCP connection. So we send a SIN, the server sends back a SIN act, and now we send back an act. But we can also include data with that act. The data we're gonna include is, hello, I support TLS, RSA, ES128, and SHA2, or I support SSL, RSA, Crippledez, and MD5. Basically the client is sending over a list of the crypto communication protocols that it supports. Now the server has a policy for which protocols it's willing to support. Amazon's pretty smart, and so they're probably gonna say, no way am I gonna support MD5, right? So I'll support you using TLS, RSA, ES128, and SHA2. If they can't agree, your browser won't be able to connect. So if you took a very old version of Internet Explorer or Firefox, you tried to connect, it probably wouldn't work. Okay, so now we know a protocol we're gonna use. The next thing that the server's gonna send over is it's certificate, which is about a kilobyte in size. All of this exchange occurs in the clear. There's nothing we're trying to protect here. We can send it in the clear. Now what's inside of this certificate? This is an X509 certificate that contains the following. It contains Amazon's name, it contains their public key, and a whole bunch of additional information, like their IP address, the type of certificate, when the certificate expires. You may sometimes get a message, the certificate that we received has expired. What do you wanna do? Okay, it also includes who signed it, the certificate authority. And it contains a public key signature, which is basically a SHA256 hash of using the signatories private RSA key of the following. So it's the public key of Amazon, the Amazon name, all these fields that are in the certificate hash and then signed with the private key and a bunch of other information. So what do we wanna do now if we wanna validate Amazon's identity? Very straightforward. First thing we need is the certificate authority's public key. That's hard-coded either into the browser or into the OS. So you'll see every once in a while, you'll do it, an update on Windows and it'll say updating IE's key store. That's the list of certificates. If you have an iPhone or an Android phone, you have to wait until you get a new version of the OS to get a new list of these certificate authorities. And that can take actually quite a long time. We'll come back to that in just a moment as to why that's an issue. Now if we can't find one of these certificate authorities, if we can't find its key, we warn the user and say, we couldn't verify this certificate. Continue on knowing that this might be a fraudulent site or they can quit and go somewhere else. So the browser uses the public key that's in the service certificate, the sort of signatories certificate to decrypt the signature. And then it compares it with its own SHA-256 of the rest of the fields in Amazon's certificate. If they match, we know it's Amazon we're talking to with a gigantic caveat. The big caveat is this is assuming that the signatory is trustworthy. It's also assuming the signatory has not been compromised. So on September 3rd, 2011, it was discovered that a certificate authority had been compromised. The private key had been stolen and lots of certificates had been created. In fact, 531 certificates were created for some of the most popular websites like Google, Yahoo, Mozilla, Floor Projects WordPress. Now why would someone do that? Because then they can launch a man in the middle attack. They can set up a server, force you to go through it, like a cyber cafe or at an ISP. And then they can pretend to be Google. And your browser will have the green bar and you'll think you're connected to Google. But in reality, you're connected to this man in the middle who's able to see all the traffic and then relays the traffic on to Google. So to you, it looks like you're conversing with Google might be a little slower, but the attacker gets all of the information. It turns out this happens a lot. The Electronic Freedom Foundation has a website called the SSL Observatory where they look at revoked certificates. Turns out certificates get revoked a lot. And that's one of the thoughts is because of compromises of certificate authority keys. Yeah, all right, is there more to the story about how the key was stolen? Unfortunately not, you know, there are many ways that keys can be stolen. You can have social engineering attacks. You can have zero day attacks that are done against an employee of the company. And then that's used as a stepping stone to ultimately get to the servers that have the real data. That's called an advanced persistent threat attack. We're gonna look at that on Monday. But there's a whole variety of approaches that could be used to break in and get the secret keys. Ideally, you know, you'd like to have the secret keys stored on a machine that's not connected to the internet, but it's kind of hard to do that because it's gotta constantly be signing certificates as you generate them on the behalf of your customers. So the certificate that you receive from Amazon is not encrypted at all. It's just a plain text certificate, but it contains integrity checks so you can verify that you got the certificate without tampering. Ah, so the question is why couldn't you just hijack that certificate? Well, because the certificate only gives you Amazon's public key. So any messages that you encrypted with that only Amazon could read. So one of the other problems here is that when it was realized that Diginotar had been compromised, you'd immediately like to invalidate all of the certificates that it issued. It takes an OS upgrade. So that makes it a very, very slow process to be able to close this. On laptops and desktops, it's typically you update the browser and then you get new set of certificate authority keys. So that you can do very quickly and you can kind of push that to users very quickly, but trying to revoke in a device like a phone much, much harder to do. And again, this is happening all the time. Go to the SSL observatory and you can see there a whole report about it. Ah, so the question is would it be safer to have a place where you could download an upgrade update list? This is the magic problem, right? Because you'd like to know when a certificate has been tampered with and so that's exactly one of the projects that the SSL observatory has. So what they do is if you had a centralized point, someone could compromise it, right? Now everybody gets this updated list that has all these fraudulent information in it. So what the SSL observatory does is it says I'm gonna look at all the certificates I get. When I get a new certificate, I'm gonna ask a bunch of friends or a bunch of other people, have they seen the same certificate or not? And if they haven't, maybe there's something fishy about this certificate. So now this requires you to have multiple compromises that would have to occur. Still doesn't completely protect you, but it makes it a little bit more secure. Another problem is that certificate authorities, remember I said there was like 6, 700 of them, there's a lot of them. Some of those are governments, which means they can issue whatever certificates they might choose to issue. Then you can put on your little tin foil hat and wonder why they might do that. Okay, so assuming we don't have fraudulent certificates, assuming we, certificate authorities, and we have valid certificate authority keys, can we verify a certificate? The answer is yes. Here's how we're gonna do it. Okay, so we take this certificate which contains all this information like the public key and Amazon contains this hash, that's signed with the certificate authority's private key, and we're gonna decrypt it with the hardwired public key. So we get back this hash. We are in parallel going to compute the hash over everything else that's in the certificate. Then at the end, we're gonna compare them. If we get the same hash as this encrypted one from our own hashing, the certificate passes validation. We know we have a valid public key. We know we're talking to, if it doesn't pass, something's up. Again, we'll get a pop-up message saying the certificate could not be verified, what do you wanna do? And the browser manufacturers have tried to make it harder and harder, right? There's all this text saying this site could infect your computer, it could do all sorts of bad things. Do you understand the risks that we wanna continue? It's remarkable how many times people just hit yes. Like I said, you can also use a peer-based approach to validate the certificates if you're worried about potential compromise of a certificate authority. But it's not perfect. Really, there's a bunch of different algorithms that people have tried to come up with for how you might be able to use peer validation, but maybe you have a list of trusted friends and you only trust those friends, but if one of those friends gets compromised, for now we'll assume we don't have compromise. Okay, so where are we in this, all of this work that happens in the first few milliseconds when you connect to Amazon? So what's happened now is we have verified the key. We're now going to use the browser is going to construct a session key using a random number generator. This is the key we're gonna use for all of the rest of the communication. We use Amazon's public key to encrypt that session key. So Amazon can now decrypt it and communicate with it. The browser now displays the locks and the bar turns green or, you know. All subsequent communication is encrypted using our symmetric cipher. And now you can type in your password. So that's how it works. Too complicated, actually very complicated. Lots of moving parts and lots of dependencies. Any questions? Okay, so this naturally leads to, let's talk about authentication. So password is just a shared secret between two parties. The server has your password, you have the password. Now if only you have your password and only the server has the password, we can use it as a way of identifying it. You type in your password correctly and we know it's you. Very common. Most common authentication scheme out there. Now the system has to keep a copy of the password because malicious users are always gaining access to these lists of passwords. In the Adobe case now they think it was more than 100 million passwords that leaked out. So we wanna make sure those passwords are obscured. So what are we gonna do? We're gonna use a mechanism. Our mechanism here is gonna be to transform each password into a way that's difficult to reverse. So hashing encryption effectively is what we wanna do here. We're gonna keep our password secret. The way Unix originally did it was it takes your password and it hashes it. This gives you an encrypted password. More recently people would do something like an MD5 or a SHA-1 of the password. So now when you type in your password the system hashes it and compares the hashes. If the hashes match we know it was you. Some common ways that passwords get compromised. So guessing is the most common way. Because oftentimes people use a remarkably simple password like their birthday, their spouse's birthday, their girlfriend's birthday, or their boyfriend's birthday, or their wife's birthday, or husband's birthday. Anniversary, that's a very common one because you'd never wanna forget your anniversary. Their favorite color, names, and so on. So let's start with a trivia question. And I'm gonna base these answers on a compromise of 32 million passwords from a music site. Brock you back in 2010. Okay, what do you think was the most common password? No, that was actually, that was number four. That was the fourth most common one was password. One, two, three, four, no, that was pretty far down. So actually Rock You was, number one was one, two, three, four, five, six, used by nearly 1% of the people. 1%, so one in 100 chance, if you guess a password as one, two, three, four, five, six, you're in the account, okay? The next most common one, number two, one, two, three, four, five, number three, one, two, three, four, five, six, seven, eight, number four was password, number five, I love you, number six, princess, number seven, Rock You, number eight, one, two, three, four, five, six, seven, number nine, one, two, three, four, five, six, seven, eight, and the next, A, B, C, one, two, three. But what I found most interesting on this list, aside from all of those, was the number 29 password, most common password, Anthony. I have no idea why, didn't know I was that popular. Okay, so guessing can be remarkably successful and there are many, many, many lists that you can use and basically if you go to a place like skullsecurity.org, they have lists of passwords, sorted by most common frequency that they occur and also, so then you can just basically do a dictionary attack, so if you get one of these stolen encrypted lists, you can just simply take passwords from here, hash them, and compare the hashes. If you pre-compute this, that's called rainbow tables, you basically just pre-compute all the hashes for the values, yeah, question. Yeah, so the question is when it's comparing passwords, so this is kind of going back to that, and it says weak, medium, and strong, what does that mean? This is going back to that sort of XKCD cartoon, so it's looking to see, are you using like all lowercase letters? Did you add a symbol? Did you add a number, or capitalization? And the more of those that you include, the more bits of entropy that you're adding to the password. But there's no really clear definition of what weak, medium, and strong. Some systems will require you to have extremely complex, long passwords, and that's what they define as hard. Others will be, if you add a number, it says it's hard, which is probably not really true. Okay, so another common attack, dumpster diving, which is basically finding pieces of paper with their passwords written on them. So again, don't try this at home, but after people log in in the labs with their little 162 or whatever forms, they may just drop them in the recycling bin, and that has the password. Hopefully they've changed it. This is also a technique that's used by other adversaries to get your social security numbers or credit card applications. So the first thing you should ever invest in, they're not that expensive, is a shredder. If you're going to, if you get anything sensitive, a bank document, a credit card statement, a credit card application, a phone bill, all of that sort of stuff, you should shred, not just recycle, because it turns out, the laws are kind of unclear about whose property it is after you drop it in the, okay. Now, there's a bit of a paradox. If I give you the ability to create a short password, it's easy to crack. If I give you a long password, you're gonna write it down. And technology only makes this worse. Unix originally started with just lowercase five letter passwords, so only 26 to the fifth power, 10 million roughly different passwords, which in 1975, took a day to crack. 2005, crack it in the 10th of the second. If you only look at words in the dictionary, again, there's only like 40 or 50,000 of those, you can crack it instantaneously, yes. Absolutely, so the question is why don't we delay multiple authentication attempts if the user type's wrong password? Many systems do that. The issue here is if I get the password file, then I can do that offline. But definitely for online systems, they'll lock you out, they'll delay, they might add a second, and add two seconds, and add four seconds, so you can't do sort of rapid-fire cracking, yeah. Absolutely, and you may recall about a year or two ago, there was an attack against Apple, where people were causing password resets to be sent out. And that was sometimes locking people out of their applicants. Okay, so we can't make it impossible to crack passwords, but we can make it harder. So one of the techniques is to add a salt. So this adds a 12-bit salt originally in Unix, which was derived from your username. So you can hash your username, get a 12-bit salt out of that, and use that to make your dictionary attacks two to the 12 or 4,096 times harder. So now it'll take a little bit longer. Modern systems typically use salts of anywhere from 48 to 128 bits. If they implement it. There have been multiple password breaches where they hashed them, but they didn't include a salt, right? They just used MD5 or Shaw one. So you can use a rainbow table or any kind of other offline technique to very easily recover the passwords. Another technique, more complex passwords. If you add more digits, you add digits, you add capitalization, you add symbols, you increase the space, increase the entropy, harder to crack. But again, people will pick common patterns, and if you make it really, really complex, they'll just write it down. So that doesn't matter. Third technique is what we just talked about, which is you can just simply delay the login attempts or lock them out. That makes it infeasible if it's an online attack. Other techniques, give them very long passwords or passwords, gives you more entropy or randomness. Embed the password in a smart card. So we all carry around these little RFID cards. They have a little ID in them, which is your password, which you can use to get into various rooms or pay for items and things like that. You can add a layer of, so now this requires you to physically steal the card to impersonate someone, sort of not quite, but for the most part. You can add a pin. So some of the card readers that we have actually have a pin pad on them, so you have to use your card and then you have to enter in a pin code in order to use it. Even better is to download something like Google Authenticator and turn on two-factor authentication. How many people use two-factor authentication? More people should use it. I mean, it's really easy to turn on for your Dropbox account or your Evernote account or many online services either have it or are rolling out two-factor authentication. It's a simple app that runs on Windows phones, on Android phones and on iOS phones and has a pseudo-random number generator where the seed is shared between the server and the application. And every 30 seconds, you just simply generate the next number in the sequence. So now someone has to steal both your, oops, oh, I didn't plug in. So now someone has to steal your password and they have to steal your smart key. See if it'll come back. I'll keep talking while it boots back up. So this adds yet another layer of security because now two things have to be stolen. One thing that's physical, your token and a second thing that's not, namely your password. See how quickly this can come back up. So you can get even more complicated and you can instead use an algorithm like for example, here we go. Okay, so you can encode an algorithm instead and use an algorithm like for example, a zero-knowledge proof. So what a zero-knowledge proof does is it's basically a secret algorithm and I do a series of challenge and responses between the server and myself, okay? So both of us have the algorithm. It's secret and the server gives me a number. I compute on that number. I send back the response and even if someone sees lots of these exchanges, they can't guess what the algorithm is. No knowledge is leaked. Oftentimes this is performed by a smart card so here's an example of a smart card that does exactly that. So it has the secret algorithm in it and it communicates to its little contacts and exchanges the responses to the challenges that it's given. You can also, third technique, can you get my computer to respond? Third technique is biometrics and biometrics have become very popular. So this is where you have to look at something and it scans your iris or it scans your palm print or measures the size of your digits and the locations of knuckles or fingerprints is perhaps the most common one. The challenge here is these can be very accurate and very useful but if the data gets stolen, you can't replace your fingerprints or change your fingerprints or change your iris print and so that's a huge risk. One thing that they did address, you always see this in movies is they cut off the person's thumb and they use the thumb or they pop out the eyeball and use that. The modern scanners actually look for a pulse so it won't work with a disconnected body part, yeah. Yeah, I think it's really hard to do. It depends on the level of sophistication of the fingerprint scanner. Some are easily fooled. The new touch ID on the new iPhones, yes. If you have access to someone's fingerprint, you can duplicate with a half an hour amount of fairly sophisticated amount of work, duplicate the fingerprint. It's better than not having a passcode so it takes us a little bit further but I would not keep your nuclear secrets on your iPhone. Yeah, well that's why these sensors all require a heartbeat is so that thieves are not encouraged to cut off your finger and try to use your finger. It won't work, it just simply won't work, yeah. Yes, absolutely, so one of the challenges with using biometrics, you don't wanna use it for your nuclear power plant because let's say it's an iris scanner or it's a voice print identifier. During high stress situations, you might be crying or your voice might be stressed and so you might not be able to authenticate and insert the control rods or something like that. So absolutely, if you get burned or something like that, your fingerprints go away and then it can be a problem if that's the only way you can authenticate into the system. So typically there's always like a back door or some other technique that you can use with a password or passcode to get away with it.