 Thank you very much for the introduction. So here's an outline of my talk. The first half, I'm going to be talking about collision attacks on cyproblock chaining and CFB invodes of operation. This is the part of my talk that has all the practical importance to it, the relevance to practice. And then the theoretically interesting thing come later with the impossible plane type of analysis, right? So there's both, but they are a little separate. So I'm talking about block cypress today. And the block cypher will have two inputs, right? A key and a plane text or the cypher text if you're doing decryption. And so I'm going to be focusing on the number of bits in the plane text, which I'm denoting as W. And so some good examples of block cypher's counting the 64-bit block cypher's that are the focus of my presentation. And we have things like triple desk, which are still inferiorly widespread use, and some other 64-bit block cypher's that are used, as well as the advanced encryption standard and other block cypher's. Since 2000, they use 128 bits instead of 64. So just to establish some of the occasion, right? Everybody's familiar with block cypher modes of operation. We have plane text. We're going to assume the plane text can be logically separated into different blocks. And then we're going to encrypt it and get a sequence of cypher text. And the cypher texts are going to be longer. They're going to be more cypher text blocks and plane text blocks. And we're going to ignore some of the details and just focus on the interesting things. So there's a relationship between the plane text blocks, each plane text block, and the cypher text blocks, right? And this is going to vary depending on the different modes of operation we're considering. But there's a simple relationship. So we're going to do some attacks. We're going to use a really simple plane text model. So here we've marked all of the plane text known to the attacker in this green color. I'll just hold onto this. So the attacker is going to know all of the parts of the green blocks and then none of the parts of the solar color blocks in this presentation. So we're going to define this indicator. And for the different modes of operation, the indicator function is different. But it's a function of the cypher text. And when a collision occurs in the indicator block, then there's a relationship between the plane text blocks. So the exclusive or the two plane text blocks will equal some value, which is going to depend on what the mode of operation you're in. But it's going to be something that's easy to compute. So the attacker learns information whenever there's an indicator block collision that corresponds to a block that's in the unknown set and one that's in the known set. So when this happens, then the attacker can exploit this knowledge. So everybody is familiar with the birthday bound and the idea of being indistinguishable randomly. The point here is that we can actually construct an attack that can learn real information and then actually apply this to some real-world scenarios. So if the attacker actually knew one plane text block outright and didn't know another plane text block, then it would be easy to figure out what the unknown block is. In general, you can actually do like a Bayesian analysis. And if you have partial information about one block and partial information about the other block, then you can do more sophisticated sorts of attacks. And so in principle, this works. It's hard to quantify how well it works because you have to have a sophisticated model of what the plane text is. I want to provide a motivating example from the real world. So suppose that the plane text block I contains a network address. So if you're familiar with private addressing, these are the possible subnets that are allowed for IPv4 addressing. And you can see the interesting thing here is in this bit location, we're either 0 or 1 and same thing here. Now suppose that plane text block J is ASCII encoded, which means that this bit in that bit location will always be 1. Then this delta is going to have this form here. So remember that when there was an indicator block collision between these two, then the exclusive bar of these would be some known value. So an attacker can look at this information and read off what the piece of I is, right? Since these three are completely distinct. So an attacker can learn what piece of I is and then figure out exactly what piece of J is. So this is an example of having partial information about this plane text block and this plane text block and using that to recover all of the plane text. And in the real world, there are a lot of protocol formats like this that enable you to do inferences. If you look at them, it's difficult to quantify exactly how damaging attacks like this are. But there's a lot of examples like this that one could give. So when does the attack work? So the birthday bound comes into play, right? We have these indicators, like I talked about before. So we need the product of the sizes of these two sets to be the same size as the total set of indicators or 2 to the power of w. So they don't have to be equal size, but this is the generalization of the birthday paradox, right? And these collision-based attacks are very easy to do because they take order and work and storage if you've got n blocks you're attacking. So because of the birthday bound, this is not surprising, right? So if you have k blocks of known plane text and u blocks of unknown plane text, then you get this simple bound, which is assuming n is the sum of those numbers and it's quadratic in n. So this is the expected number of blocks, I'm sorry, expected number of bits of plane text that would leak out, is bounded by this number here. So we have n squared, number of blocks squared times the number of bits divided by 2 to the power w with the magnitude vector in there. So here's a nice graphic of that. I've got the logarithm of the number of blocks on the bottom and then the expected number of bits leaked along the vertical. So this is essentially a log log plot, right? Because it's nice to see something like power log. So the interesting thing is that in the real world, it's easy to get to numbers of plane text sizes that get you leaking bits of information, right? So all of this applies as long as you don't change the key and you keep encrypting using 64-bit block cipher. And even at a gigabyte, you're probably leaking some information. Once you get to a terabyte, you're definitely leaking a lot of information. So maybe the best example of this would be for network traffic. So with 64-bit block cipher, if you encrypt at 1 gigabit per second for one day, then you're going to be leaking about 6 million bits of information. So why is this interesting? Well, triple that is this 64-bit block cipher and there are a good number of encryptors on the market that support gigabit triple that. So it's entirely possible that people haven't gotten the word yet and are still doing this and they should stop. So the advanced encryption standard for any other 128-bit wide block cipher, at a gigabit per second, there's a 10 to the minus 3 chance it's going to leak a bit of information. So roughly speaking, this highlights how much better 128-bit block ciphers are for modern data rates and data sizes. So if one were using a 64-bit block cipher and wanted to mitigate this, then the first thing they do would be let's limit the number of blocks we encrypt under any distinct key. So you can rework the equation that I showed earlier to say, well, OK, but I realize if there's some total number of blocks that I'm going to encrypt, I can figure out how much can I rekey if I don't want to leak information. And so an example of that would be if you're going to rekey every billion blocks, then the most you could encrypt would be 2 to the 40 blocks. So this is an inherent limitation of this size cipher. And 2 to the 40 is great for really constrained environments, but it would be completely inappropriate for something like the gigabit encryption example. So that covers the part of my talk on the collision attacks. And I want to talk about countermode. So when I say countermode, what I mean is where we're encrypting a counter, here we have the i-flame text block is being encrypted with a block cipher. We're neglecting the key and the notation because it's not important for the attack. We're assuming it's fixed. So we're going to encrypt the counter i and then exclusive for that into the plain text to get the cipher text. So there's no random knots here. And there's no opportunity for a collision that's going to directly leak information. So using the definition of indistinguishability, countermode and CVC, both have the same level of security. But in practice, actually, countermode, attacking it is different. And it's actually a bit harder to attack. But let me go through the example. It is possible for an attacker to learn information about plain text from countermode when you go beyond the birthday bound. So let's assume we've got two different blocks, i and j, of plain text that we've encrypted. We also know that the encryption of i is distinct from the encryption of j, from the invertibility of the block cipher. So we know that this plain text block is not equal to the exclusive or of those three other values. So it's a very small amount of information, but we know that. And we can build up an attack out of this. So what we want to do is extend that observation across multiple known plain texts. And so here we've got, again, we're showing plain text blocks up at the top. We have a set of known plain text here. So on this row, we're showing the different blocks that were exclusive or into the plain text. In other words, this is the encryption of i, where i would count the blocks along here, resuming all our known plain text is bunched up just to make the diagram a little cleaner. So we're going to define this epsilon. Capital epsilon is the set of these blocks here that correspond to our known plain text. So we can extend that observation we made earlier across this set of known plain text blocks and say, OK, what information do we have about, say, this plain text block? So we didn't know any information about it in advance. But we know that if you take any of these values and exclusive or of them with that, that's the value that this will not be equal to. So the plain text block will not be in the set of capital epsilon exclusive or with that cypher text block. So now we're getting somewhere. We're composing some information about this. Let me talk about a slightly more sophisticated model of the plain text that we'll need to use to actually make this a tax accessible. So here we have an example of four emails. They're all to bobatexample.com. Three of them are from Alice at example.com. And one of them is from Mailmaster. So we're going to define target values to be the thing that the attacker wants. In this case, it's the minimum bid and it's a password. This looks like a human-generated password. When the same value appears multiple times in the message that's encrypted, we're going to call that. We're going to say the value has repetition of r. Little r there. So this is a repeated value of repetition 4. So this actually does occur in the real world and this is essential to a successful attack against the counter-mode. And to be a little bit more sophisticated here, we're also going to consider incidental values that might appear multiple times. So an incidental value would be something that the attacker doesn't actually want to learn the information, but they might learn it as a side effect of prosecuting the attack. And when they learn the information, it expands their set of known plain text. So it makes the attack more effective. So now what we need to do is extend the attack across repeated target values. So remember the figure before? We were trying to learn some information about this plain text block. Well, let's say that we've got the same value, little p, that appears in a number of these blocks. Well, for every one of these cyber text blocks here, we know that the value of that exclusive order with this is excluded from what this could possibly be. So for every time that this little p appears, we can exclude a whole big range of potential plain text values. So this actually enables us to put together a reasonable attack. So if we have an ununrepeated value of repetition r and little s number of possible line text, then we can actually exclude. We can make a possible attack here by excluding all of the possible plain text until we've wintered them out. And we just have one value that remains. And that's the value of the plain text. So this actually works when there's s possible plain text. And k known values and repetition of r, when we have k times r greater than this value, which is something like w times 2 to the w, then the attack works. And these numbers, you might notice, it's harder to do these attacks than it is to do the CBC attacks. And I'll go through the algorithms here. But first, the heuristic, why does this attack work? And why does it work with this success probability? So what's the size of the set when we take the direct sum of the two sets of the capital epsilon, capital G here? Well, if we assume that it's the k times r, which actually appears to be a reasonable estimate, then we can imagine that we're collecting coupons where little s, again, is the number of possible plain text. So we're doing something like a coupon collector problem here. So when we actually go to do this attack and practice, then what do we have? We have, we can do a sieving approach and loop over the epsilon, the set of blocks that are the cipher outputs, and then loop over the different repetitions. And then we can just remove the exclusive order of these from the set of possible plain text. And then once we're done doing that a lot of times, then we figure out the right value. We're going to do something that's more like a searching approach. And we could loop over possible plain text. And then for each repetition, then we could search to see if it's in the set epsilon, and if it is, then we can remove it from the set of possible plain text. So let me go back on this one. So the sieving approach is going to take k times r operations and s storage. And the searching approach is going to take rs operations and r plus k storage. So this searching approach has a little better on storage. So you can actually make a hybrid algorithm out of these, which is slightly better, because the sieving is better when k is less than s. And the searching algorithm is better otherwise. So the first few passes of the sieving algorithm will actually greatly reduce the possible plain text set size. So you can make a hybrid algorithm. And you can run the sieving one first, and then the searching one later. And then you can improve that by noticing that you've actually divided this set epsilon into two distinct sets, and use one in one phase and one in the other phase. So it's possible to improve on that a bit. So I want to go ahead and wrap up the presentation with some conclusions here. I think the most important conclusion is that the CBC and CFB encounter all link information about the plain text at the birthday bound. And even when you're just right at the birthday bound and a great amount of information is encrypted with the succession of keys going right up to the birthday bound, again, that's going to lead a decent amount of information. And there are some practical attacks that exploit this. And it's a security risk if you're running it by data rates or if you're encrypting a large volume of information with a fixed key. And so these can be exploited by practical attacks. The attacks seem especially practical for CBC and a little bit more hard, especially storage-wise, if you're going to attack counter mode. So what's really interesting is that counter leaks information more slowly in the known plain text model. So the intuition for this is if you're doing a collision attack, then you end up with this is how you learn your information. The exclusive or of two plain text blocks is going to equal some value. And if you're doing that same attack on counter mode, then what you're going to learn is that the exclusive or of some values is not equal to something. So this is a lot more information. The equality is a lot more information than the inequality. So that concludes my presentation and I want to thank you for your attention. The feedback done is less than, and I wonder how your observation changes if the feedback is shorter than the full book. So I haven't looked at that. I've been, to be honest, motivated by practice and specifically trying to provide a warning on when you should not use triple test to be really honest. That's an interesting question about CFB. And I think what's also interesting is I haven't, none of this work that I've presented has discussed any of the more modern modes of operation. OCB or CEMC, your encryption mode, which would look much better than any of these. So I think it would be interesting to apply this to some other modes of operation and see how well they actually bear without plain text recovery attacks or not just in distinguishability. Dan, you have a question? It seems that the extra security you're talking about for counter mode is minimized if there's two choices for a plain text-free plain text block. That gives the minimum extra security for counter mode and to get the maximum security from what you were saying about equality versus inequality, you want the maximum number of choices for each plain text block. So would you recommend compressing your plain text in order to maximize the number of choices of each plain text block? You know, I'm not sure that compression is going to be an effective solution. I don't think one should try to get security through compression. You could actually go to homophonic coding on top of counter mode or something like that, which would be a perfectly valid way to try to achieve better security here. So that you could encode in multiple ways and add some redundancy that way. Make it so that the crypt analyst, it was facing more redundancy.