 Hi everyone, I'm Shweta. I'm going to be talking about preventing pollution attacks in multi-source network coding and This is joint work with Dan Bonnet, Xavier Boyan and David Freeman So here's the outline of the talk We first see a quick introduction to network coding. Of course, we just already saw that but for completeness I'll do a quick review. Then we look at pollution attacks We look at the security challenges that come up in multi-source networks as Against single source networks that we just saw In particular, we're going to look at insider attacks in multi-source networks So in a network where there are multiple senders there is there is a new very real threat of Senders wanting to frame one another and these are realistic threats that come up in applications like with torrent And we'll we'll look at how these attacks are modeled and how they can be prevented So having examined these attacks will prevent the solution which has a matching lower bound So moving on this is a brief schematic that illustrates the idea of network coding that we just saw You have a user Alice and she wants to send two bits B1 and B2 to these two receivers here and You can see the network shown in the figure. She has a distinct path To each to each sender that does not share any edges And she has a second path with each with each receiver that that shares this edge R3 R4 So clearly if she only if she only sends Either B1 or B2 down down this path then she needs at least two cycles to communicate both bits to both receivers However, if the router R3 Sends an XOR of the bits B1 and B2 down down this edge then each receiver receives bit bi on on its distinct path and Can recover the other bit by from the XR that it also receives So this nice intuition lends itself to Secure to proofs and it turns out that network coding is provably useful It achieves capacity in many networks. It also has other applications in wireless communication data storage and so on Let's look quickly at how Network coding is implemented. So here again you see the user Alice and she has this file That she wants to transmit over the network. We'll say that this file has an identifier ID one so think think of identifier as file name for example So say that she breaks her file into a bunch of a bunch of orthogonal vectors u v and w and Each packet that is sent across the network is is this data followed by the file identifier Followed by a unit vector with a one in the I position So what does she do to send this information down the network? She takes random linear combinations So she picks random coding coefficients a b and c from the underlying field And then she just takes a linear combination of the of the data to get this this new aggregate vector as we shall call it and because of this Identity matrix that you get so if you stack these vectors up you can see how you get an identity matrix and because of these unit vectors the coding coefficients are Maintained in the aggregate packet So that's what the sender does and what does? What does the router in the network do so? Similarly is similar to the sender the router receives two data packets these could already be aggregate packets So they have a non-unit augmentation coefficients and the router again picks some random coefficients given by K and J in this case and takes a random linear combination of Of the incoming packets and transmits this out And because we kept track of of the coding coefficients in the in the augmentation component It's clear that the receiver can recover the original vectors that was sent just by inverting the inverting the matrix That's formed by stacking the augmentation vectors together so so far we've seen how network coding is useful and And how it's it's simple to implement But we also so far have assumed that all nodes are honest, which is obviously not a realistic assumption and In in this setting there are old challenges of security like data integrity and authentication which Follow from from the traditional world of networking and cryptography But there's also the new challenge that we already looked at in the previous talk there is the pollution problem and In particular the the takeaway is that standard methods do not apply So that there are standard methods of signatures and max that are used to address these problems But in the setting of network coding these methods do not directly apply So they so we need to develop new mechanisms to address the security problems So Let's look briefly at the pollution problem. So here we see Alice and again, there's a network and she wants to transmit data to two receivers So now in this setting we assume that the node R1 has been compromised So a malicious user controls controls the router R1. So Alice as usual Chooses a file that she wants to send that she breaks into a set of vectors She takes linear combinations of these vectors and sends them out on every edge But now because the node R1 is malicious Instead of forwarding a legitimate packet it forwards it forwards some bad data and You can see how every node downstream from the malicious node Is going to use this bad data to mix all its its packets with so very quickly Just one bad packet can pollute the whole network and the receivers sure enough cannot Recover the the messages correctly So if you think about it in this in the setting of network coding What do we want in order to achieve security? What we want is the notion of hop-by-hop containment. So what this means is that? Even if a malicious user does inject a bad packet into the network It should not be able to traverse more than one hop So even if I do inject a bad packet the very next node that receives the packet should be able to detect it as Bad and drop it and this will prevent it from mixing with the good data In the rest of the network and from polluting the network So in particular What you want is that any router in the network when it receives a combination of packets It's faced with this question of whether this aggregate packet it received whether it's valid or not And whether it should forward this packet or whether it should discard this packet and This problem for has been addressed for the case of single-source networks Solutions can be either information, theoretic or cryptographic. We'll focus on the cryptographic solutions Is it cryptographic solutions have been either subspace signatures or message authentication codes? We just saw another nice new subspace signature scheme But all these solutions All these solutions only address the case of single-source networks and as we shall see Carrying forward these solutions from single to multi-source networks is not entirely straight It's not straightforward and we'll see that there are new threats in the multi-source model and we look at how those threats can be addressed So let's look at multiple sources in network coding so This is just the just pure network coding when there are multiple sources We look at the security aspect in the next few slides, but here we just have two senders and They have two files to send for simplicity in this setting I'm assuming that every file comprises of a single vector. So it's a one-dimensional subspace for every sender So now I have the first user who has a vector u a file ID ID one and the augmentation Component of one the augmentation Component having only one bit implies that it's a thing. It's a there's only one vector in the subspace and similarly for the second sender you have a You have data vector v a different file identifier ID to and the augmentation component So how do we mix these data packets together given that they come from different files? The idea is fairly straightforward all that we really need to do is a sort of Uniformization of the vectors so here we can see how we've added an extra augmentation Augmentation bit for for each of the vectors and we've uniformized the The file identifiers so that both of them now have the same the same sequence of file IDs And now these packets are amenable to mixing just as in the single source case I It helps to think of the multi-source case as having the super source that is Transmitting all the packets to all all the different sources and it's really the same problem So it's the same idea There's just more bookkeeping to do when you want to do multi-source network coding however as however as I Alluded to earlier the security in such a setting is a much harder problem So in particular users can frame one another So in in a network when you have multiple senders and each sender is a legitimate User of the system and has a private key and a public key pair You now have the threat that one one sender can frame another sender and by that I mean that a sender can make Thus the a malicious sender can make the honest sender look as though it sent some data that it did Did not in fact send and we'll see we'll see how this can happen So in this work, we show an impossibility result if if we naturally extend the The signature schemes and the security model from the single source setting We show that pollution attacks in that setting cannot be prevented. It's impossible and in fact We also show that this natural extension forces the receiver to solve the clique problem in order to decode Like I said, we will show a general attack on any scheme So let's look at what a signature scheme in the multi-source setting might look like a natural Generalization of the single source setting it would comprise four algorithms as before the first is set up which takes the security parameter and the dimension of the subspace and the consideration and outputs a public private key pair and the the field modulus P the vectors to be signed live in FP to the end and then We have the sign algorithm that would take in the secret key the file identifier and the vector Output a signature Sigma So remember that the file identifier is what binds all the all vectors to a given file and To sign a vector space you would a vector space represents a file So to sign a vector space you would choose a file identifier and sign all its basis vectors using this algorithm the verify algorithm would take the public key the file identifier and Aggregate vector and the signature and output true or false depending on whether that vector lives lives in the correct subspace or not and the combine algorithm is really an artifact of network coding because you're combining The data you also need a combine algorithm to combine signatures. So the combine algorithm would take these random coding coefficients a and b using which you're combining the data and Signatures sigma and tau of of the original vectors and produce a new signature gamma such that if sigma is the Signature for V and tau is a signature for w then gamma is a signature for a a v plus b w so now we'll see how this the definition of the signature scheme that we saw can be attacked and Since we've made no assumptions on the specifics of the scheme itself We're only considering the definition and the functionality of the scheme you you see that this attack is very general and very real So what happens here? So you have a sender Alice and she wants to Transmit some files and now you have a malicious user who we refer to as melee So melee wants to frame Alice in particular what he wants to do is that he wants the receiver to think that Alice sent send some information that she did not in fact send. So how does he do this? So what he what he does is he? Sends different files with the same file identifier So this is clearly Not not allowed in the protocol of the system because each file needs to have its own unique identifier but The problem is that every user is choosing his own file identifier and there is no way in in this System to bind to force the user to be honest about this. So let's see how he can use this So he creates two vectors that he wants to send into the network. So let's let's See there's V and there's W now. These are again two separate files And they should have two distinct file identifiers But as I mentioned already He's going to cheat and he's going to use the same file identifier for for both vectors for both files and Let's say that each vector has a signature sigma and the signature tau now Alice has Her own file that she wants to send the data is vector U and the file identifier is ID 1 So what does Malay do first? He combines these two Packets as they would be combined in the network and he gets this new packet So this is a very this is a legitimate operation in network coding He takes a linear combination of the two packets and he gets a new packet with ID 1 ID 2 This is the augmentation and this is the new signature Which he obtains say just by appending the old signatures Along along with this aggregate packet and the other packet that he had created By reusing the file ID these two packets are sent to the receiver Now when the receiver decodes, what does it find the receiver the receiver decodes? That Malay sent the vector w because that's what he received in this packet and he associates that vector with Identifier ID 2 and in order to recover what Alice sent He he subtracts w from u plus v and he thinks that Alice sent vector u plus v minus w Which is clearly false because Alice sent the vector u not not this new vector So as we saw By allowing the user to have to choose its own our own file identifiers There is no way to prevent a malicious user from reusing a file identifier across different files And this can allow allow the user to mount a very simple but very powerful attack in in a practical scenario so the natural question is whether there's a way out of this and It turns out luckily that there is if if we if we If we prevent the file identifiers from being arbitrary, but instead force them to be cryptographically verifiable That is if we have the sign algorithm generate the file identifier instead of the user picking it and inputting it to the sign Algorithm it turns out that this attack can be thwarted. So For all vectors v of a given file we want a verify algorithm that will take the vector v and the file identifier and output An output true if the vector belongs to the file and it should be hard to construct a vector y outside a subspace Such that it verifies against the identifier of that subspace so these these requirements are reminiscent and Indeed it turns out that These requirements imply that the file identifier itself needs to be a vector space signature and this is shown formally in the paper so Let's look let's look at the new scheme that we now have so instead of the user picking arbitrary File IDs and providing is that providing it as input to the sign algorithm We now have that the sign algorithm generates the file identifier for every file and every router that receives an aggregate vector which has some data and and a file Identifier component can now verify the data against the file ID and We we have a verify algorithm that we saw in the previous slide that will only Pass the the aggregate vector as valid if indeed it was it belongs to the subspace Identified by the ID that it carries This clearly thwarts the previous attack and it turns out that we can generalize the the bound from Bfkw09 on vector space signatures to get a bound for file identifiers and We we have a scheme that matches this bound So in our work, we propose a security module for multi-source networks Which captures the notion of insider attacks. We generalize the scheme from Bfkw09 This is a typo for multi-source settings and this is a simple generalization of Their work which achieves the lower bound In particular, I'd like to talk about a new primitive that we propose in the paper That that captures the requirements of multi-source Network coding settings and we call that a vector hash So this comprises three algorithms the hash setup algorithm, which takes the security parameter and the subspace dimension and outputs some public parameters The hash algorithm that takes the public parameters and a vector and also outputs a hash of the vector and a test algorithm that takes that takes a Vector y an aggregate vector a Set of coefficients given by this vector a here and the set of hashes and Outputs true or false So here h is a vector of hashes of the basis vectors of the phi v1 through vm and test returns true if and only if y was constructed as Sigma ai vi where ai are the components of this vector a So you can see that you can see how this captures what we require for a file identifier This is exactly what we need. We need a hash of a vector space such that every node in the network that receives an aggregate vector and receives this Vector vector space hash which we call the file identifier. It's able to to use this test algorithm to check whether the Aggregate vector it received was indeed generated in the correct way and if it if it passes the test then it uses uses that packet For linear combinations and forwards it to other nodes in the network If if it does not then it discards the packet and we we have hop-by-hop containment even in multi-source networks so this this vector hash can be instantiated using ideas from BFKW09 as we saw and It implies multi-source network coding signatures So in conclusion we discussed security challenges in multi-source network coding in particular We saw a generic attack for user chosen file identifiers We constructed cryptographically verifiable file identifiers in order to thwart that attack and we proposed a new primitive Called vector hash which we instantiated using ideas from BFKW09 and This construction matches the lower bound So that's the end of the talk Network coding So I I'm not sure I completely followed so are you asking if confidentiality is also a challenge in network coding so Yeah, so that network coding can can just be thought of as a new way to do routing but You know information is still information and you you want to have that information secure So all the previous security challenges apply so confidentiality is also something desirable in network coding but it's more easily achieved in network coding because The plain text is does does not get past the first hop in the network it pretty soon gets mixed with other data and You know the play the plain text messages are not not out in the open. So in that sense The particular difference is that there's no secret key associated with the hash function There's no key pair associated with it. So even if the hash function is collision resistant There's no authentication associated with it. Okay, so let's thank