 all right so we're ready for the last session of the day and of real-world crypto all right so we have blockchains and distributed ledgers and we have three great talks in that session so the first one is titled challenges cryptographic solutions with payment channel networks and the speaker is Pedro Moreno Sanchez who is just next to me and about ready to go now before I leave the stage quick announcement there is a leather Zelda bag like I'm not sure what it is could be like some little device here so if you lost it or forgot it in the bathroom it will be just right here okay with this I'll just pass it to Pedro right okay thank you for the interaction hello everybody and thank you for attending to my talk this point of the conference today I'm going to talk about scalability problems so Bitcoin and many other cryptocurrencies have scalability issues and this is because they rely on a decentralized data structure that records every single transaction that happened in the system this is good because we provides public benefit ability so we can check all the transaction that happened in the system however it requires a global consensus that in practice limits the transaction rate to around 10 transactions per second and this is far from what we have in our systems like visa or mastercard this is calamity the problem has been tackled in the community and we can roughly group the approaches in two trends one is on chain where mainly they tweak the consensus algorithms and protocols to improve the transaction the transaction rate the other approach is off-chain so we use the blockchain as many moves as possible so we use it only to solve the disputes between users this is where payments and networks fault and we have a couple of examples already deployed in practice we have the lighting network for Bitcoin we have also read the network in Ethereum and many other research projects some of them are actually implemented already as you can imagine I'm gonna focus in this second second approach today let me start with a little background on payments and networks remind that we have Alice that has some bitcoins and she wants to buy some product from Bob she can open a payment channel for that in order to do that she creates a transaction where she transfer five coins to what is called a multi-seek contract a contract that can be further spent only when both Alice and Bob agree to make sure that Alice can recover the funds she let Bob sign the refund transaction that transfer the coins back from the multi contract multi-seek contract back to Alice when that when that has been done then Alice can effectively put the transaction on chain effectively opening the payment channel and she keeps locally that refund transaction just in case that she needs it later once the channel is open she can use it to perform several transactions off-chain so she can do that by redistributing the coins from the multi-seek contract accordingly mind that she wants to pay one coin to Bob she pays one coin from the contract to Bob and the remaining four are paid back to Alice she signs that and this is a valid payment for Bob because the only thing that has to do is to sign himself but he can always do that because he has his own signing key instead of doing that he just waits for more payments from Alice she can do that by accordingly redistributing the coins from the multi-seek contract in practice the these payments can be more complex we can have bidirectional payments or Bob can pay back to Alice we have also revocation mechanisms to revoke old states and when both of them Alice and Bob are done with the we're using the channel they can close it for that just sign the last state and put it on the blockchain so we have only two transactions on the chain and many many many payments that happen off the chain this payment how this protocol however allows only two users to pay between each other so what we would like to have in practice is a payment bid from Alice to everybody also and for that Alice could open payment channels to everybody however this is really costly because for every channel she has to lock coins and she might not be so rich instead what we use in practice is Alice relies on a path of open channels between herself and the internet receiver so in mind that Alice wants to pay to Carol see first sends one Bitcoin to the in the channel that she has with Bob and this will forward this one Bitcoin to Carol herself obviously for these to be used in practice we need that the whole operation is atomic in other words Bob's who get the money from Alice if and only if he has actually paid to car this concept of payment I network is what this has been implemented in the lighting network itself and here they have tackled this challenge of the atomic atomicity by including more conditions on the payment itself so if Bob receives such a off-chain payment he can actually put it on the blockchain not only with his signature but also he has to solve a cryptographic challenge which is he's given a value Y which is a hash value and he has to come up with a value X which is a valid pre-image for that hash value and because Bob might not come up with such a valid pre-image anytime or ever we also need to add what is a time block after which the Bob can no longer use the transaction can no longer put it on the blockchain in the rest I'm gonna denote it by this notation and I denote that Alice pays to Bob one coin if and only if Bob shows some X such that X is a valid pre-image of Y before some time out has expired and now once we have this building block we can concatenate several of these has time block contracts to have a payment from the sender to a receiver let me let me show you how it works in this example so Carol creates both the value X and the hash value Y and gives the value Y to Alice now Alice can create a conditional payment condition on this value Y to Bob Bob doesn't know the solution but he can forward a conditional payment conditioned in the same value Y to Carol now because Carol knows the solution she created at the beginning she can show it to Bob to pull the money from Bob and he can forward it to Alice this is mainly how the lighting network works today and here there are two key points I should we should pay attention the first one is that this protocol requires that the time in the left channel is higher than the time in the right channel and this is to make sure that Bob has enough time when he receives the value X from a car to forward it to Alice to get the money from her the second requirement is that Bob gets a little more money than the one that he has to forward in the payment this difference is called the fee and this is important for him because it's the fee that he charged for providing the service of forwarding payments between the sender and the receiver so if you haven't got the details doesn't really matter so what you need to understand for the rest of the talk is that there is assistance called the lighting network and allow us to perform payments off the chain they are fast because we don't require to go into the blockchain into the consensus to confirm the payment it requires some little fees for the intermediary and at the first glance it looks secure to use this has time lock contract and it looks private because or privacy preserve because most of the information is not in the blockchain anyway but this obviously is only at first glance and in our group we ask ourselves whether it's actually secure and whether it actually provide privacy by default and if I'm giving here the talk you might imagine that the answer is no for both of them and the next I'm gonna show you what are some of the problems that we have with them in terms of security and privacy regarding security we we showed an attack which is called the war home attack and the main idea is that an adversary sitting between the sender and the receiver can let the intermediate nodes to participate in the payment and later is still the fee that is supposed to respond to this honest note okay let me show you in the example so imagine we have a setting in which all the users have already locked coins in this conditional payments in these stlcs now cattle or it gives the solution to to if now if instead of forwarding it to Bob he just waits until the time out of that stlc aspires and for for words it to if itself okay and then the other attacker forwarding to to Alice what happens in practice is that Bob considers that the payment has failed have not been successful and unlocks the funds after some time out the adversary pays one bitcoin to cattle but he gets 1.3 from Alice so what has happened is that the attacker has got his own fee but also the fee that was associated to Bob for forwarding the payment and this in practice is important because this attack takes back the fees for the honest user which is the main incentive that they have to participate in the network there are other also other implications for example the funds from Bob channel are locked for the whole entire time block duration that means that he cannot use those funds for other payments that have been at the same time and might have been successful additionally Bob believes that the payment was was failed was not successful and he doesn't have a way to blame the attacker for that for the attack he doesn't know that that has happened from the privacy but the view the situation is not better so imagine that we have two simultaneous payments where each Alice on the left wants to pay to each car on the right and intuitively what we would like to have is called relationship anonymity to deeply on path adversaries do not learn who is paying the home a bit more technical what we want to achieve is that the adversary cannot distinguish between two cases one case in which blue Alice pays to blue cattle and green Alice pays to bring cattle in the second case where blue Alice pays to the green cattle and the green Alice pays to the look at however if we look how it works in practice the adversary can link or can deterministically determine which of the two cases we are in the example I'm showing this slide he can see that the blue why the white value is given by the blue Alice and later is forwarded to car to the blue car so he can determine that the blue Alice is paying to the blue car the same manner because only white prime is given by the green Alice and forwarded to the green car or he can link both of them therefore relations of anonymity does not hold in the lighting network in this case of affairs we are in our group we are also coming up with cryptographic solutions that provide or improve the security and the privacy in current payments iron networks the main observation that we did is it seems that most of these problems come from the fact that we are using the same white value at each of the hops in the path so it might help if we have randomized conditions at each of the hops and we can solve them only when we get the key from the right neighbor in order to do that we could add a setup phase where the center distribute individual randomization factors to each hop in the path these are and with these two main ideas we are trying to achieve three properties so first one is atomicity which means that if the user writes lock get opened then he should be able to open the one on the left that sense we make sure that no no coins are lost we also want to achieve consistency which means that if the user one was able to open the one on the left means that he had to open the one on the right in that manner a users cannot be overpassed and we meet or we prevent from the war home attacks finally regarding privacy a user should learn different should learn about no other participants in the path other than the direct neighbor in the length in the left and in the right let me show you and high level how our protocol which is called anonymous multi-hop logs actually work it works in three phases in the first one the setup Alice provides a couple of elements to a couple of a couple of information to each user in the path this table has two elements the first one is this randomization factor that I mentioned before second is a condition that puts together all the randomization factors from the users from the center to the use to that position in the path once we have that and Alice puts the concatenation of all the randomization factors to the last one to the receiver now they can pair wise perform this lock operation we will see later how is how is performed where every user pays conditioned on the fact that they solve the condition given by Alice before for example e1 here can get the payment if he can solve the delock of the element C1 and they do it from the left to the right now cattle he was given she was giving the solution for C4 already so can directly give it back to e2 to claim to claim the coins now e2 can use his own randomization factor to the randomize the key that was got from cattle and forwarded to Bob and every user does the same so they got a key a valid key from the right they randomize it and send it back to to the left many ideas that conditions look random because they differ by a secret random factor that is known only by each of the users in the path and a valid key can only be computed if you get the right the valid key from your right so I said I omitted here how the lock contract works I'm gonna show you it here in this slide the idea is that there is a public key share between both of them Alice and Bob where the corresponding secret key is secret share between both of them Alice and Bob and they perform an off-chain payment and a two-party protocol to create a transaction a signature of the transaction which is almost valid it's almost valid because they need to come up with a value K with which they can finish the transaction itself or the signature in this case and these two-party protocol has to ensure two properties first looking at this half signature Bob has to make sure has to be convinced that whenever he lands the value key K later he should be able to finalize the signature and put the transaction on the blockchain and from the point of view of Alice if Bob ever finishes this the signature and put the transaction on the blockchain she should be able to retrieve the value K because this is what she needs to put it farther back in the path for the payment I don't have time to show you the mathematical details of how it works but I encourage you to read it in the paper I tell you that we have three constructions we show that we can construct such a two-party protocol from any monomorphic one-way function and we have other two constructions one that relies on the HNOR data signatures and the other one that relies on ECDA perhaps the last one is more interesting because it's compatible with many of the cryptocurrencies today mainly Bitcoin Ethereum and many many others also this we have shown that it achieves the security and privacy notions of interest that we are defining in the universal composability framework and the community has done some proof of concept implementations not only in the lighting network but also in the case and network and the company network itself our protocol has also other practical advantages for example it reduces the transaction size for conditional payments our conditional payment no longer needs the logic to check whether a pre-image of a has function is correct we encode this condition in the signature itself so we need only the verification of digital signatures to have conditional payments moreover perhaps more interesting practice we our construction allows what we call interoperable payment time so interoperable multi-hop payments my in a path in which one lock is in Bitcoin or ECDA based the next one is defined in Ethereum a delock based lock we can still perform a payment that goes from the standard to that is here with security and privacy in the last five minutes I have I'm gonna show you another challenge that has we have studied in the Bitcoin compatible payments iron networks by Bitcoin compatible here I mean those payments iron networks like the lighting network that do not support to incomplete language like solidity in Ethereum collateral is a term that has been used in the blockchain community to determine the to denote the amount of resources that we need to perform a multi-hop payment in terms of coins and time that they are locked for example if we have a payment that goes over n minus 2 channels each payment of k coins has to be locked or we need a collateral of at least k times n k times n coins k coins at each of the n channels also for the reason I have said before the time that the channel that the coins have to be locked at each channel is a stage in other words it depends on the position that you are in the path just for security as I said before for security you need to lock the coins at one channel at times higher than the one you have in there in the next channel bottom line is that the coins right now are locked too long in a path and this opens the problem they open some problem practice like the Griffin attack the main idea here is that the adversary can perform a payment to itself where in order to do that the adversary has to lock k plus some fees and the her in the first hop for a time n times delta and the last one has to lock k coins for a time that by doing that he forces n minus 2 channels to lock coins and also each of those coins are locked by a time which is proportional to the position that they have in the path therefore the adversary has a time amplification of n minus 2 and this in practice is really important because this delta has been set in the lighting network right now for one day so if the attacker uses a path of seven hops let's say coins are locked for seven days for a week and obviously the attacker can make this attack worse by having longer parts of by using several parts for his payment so they all or what we would like to have in practice is called we call it constant collateral the idea is that we would like to have a payment from the center the receiver where we have to lock coins but for a fixed time it doesn't the time that you need to lock the coins should not depend on the position of the path that you are this will greatly reduce the overall time where the coins are actually locked for the payment this has been shown feasible in ethereum based payment networks so they have a tooling complete base contract that actually implements this logic and if you're interested the details are in this bright paper but also in this paper they conjecture that is not possible to have such a functionality in bitcoin or in other words you can have it you can emulate such a variety in bitcoin but you need to modify the scripting language and enlarge it with more functionality we have a richer scripting language to have such a solution now our work we show that this conjecture doesn't hold by providing a protocol which is called atomic multi-channel updates which is fully backwards compatible with bitcoin and this protocol allows to have a synchronization of n channels with constant lock time and it's this protocol allows to have on not only multi-hop payments but opens the door for other applications like cloud funding netting or the channel rebalancing I'm gonna give you a brief overview of how the protocol works and again the details are in the paper so imagine that we have a really simple scenario where Alice wants to pay to cattle by buy a bop so Alice wants to pay eight coins out of ten from that she has with Bob and then Bob will forward seven coins out of the 30 that she has with kind the first thing is the first phase is called setup and Alice and Bob split their channels in two sub channels one with eight coins that they will use later in our protocol and two coins which are the remaining coins in the channel that they might use later for other payments or other functionalities second phase they lock these coins up to a time delta this constant lock time to make sure that if something goes wrong in the protocol they can recover those coins afterwards and third which is the main trick or the main point for a to music now our protocol is that Alice and Bob will pay from this eight from this channel with eight coins to Bob directly but they will do it from a fresh address a 5b 5 which has not been funded yet so because this account has not been funded it doesn't exist so it cannot be put at this point in the blockchain here I have shown only the case for a and Alice and Bob but you can imagine that Bob and Carol will do the symmetric transactions and at this point to achieve atomicity we have multi-input multi-output transaction in the phase four called enable where both of them Alice and Bob and Bob and Carol put together coins to fund the channels that they created in the phase three so they first create this unfunded channels to pay to the expected receiver they jointly fund those those payment channels at the end because at some moment after the time delta both the locking coins and the coins in enable become valid there is a risk condition so we have to create a disabled transaction but disable the phase four transaction and it is only one ballot that point is a technical detail more discussion in the paper and with that I would like to conclude this talk by saying that payments and networks have been deployed in practice and as a mitigation for the scalability issues in cryptocurrencies however they still present several challenges here today I show you challenges in terms of security with the world home attack privacy showing the relation that anonymity does not hold and collateral which opens problems like the griefing attack we are proposing cryptographic protocols and mitigate them and also open the door for new functionality like crowdfunding or netting obviously however there are still more challenges we are looking at them other groups as well and we require more solutions for example one of those is to have an scalable and interoperable routing mechanism that allow us to find the paths between the center and the receiver in the first place with that I would like to thank you for your attention I'm glad to answer any questions that you might have okay we have time for one question please go ahead just go to the mic so it seems that Alice needs to interact with every single entity along the chain isn't this revealing information about commercial relationships between the parties with lots of money and that seems like information that you might not want to reveal so if they would do it nicely it would be right so Alice will reveal herself to each hoping it will hop in the path what they use is an onion packet where Alice and information to each of the nodes a la tour and they get the information but they don't know where the information comes from okay maybe one final quick questions all these improvements that you're describing the lighting network any interaction with the lighting network team the developers like maybe some of them could be implemented like yeah so we we interacted with lighting labs and they had this proof of concept that they mentioned in my slides where they checked that was actually possible to run the protocol in in the current lighting network I also have conversation with block stream where we have discussed also the HNOR based solution and what is the advantages and disadvantages with respect to the ACDSA so yeah and they're looking into it obviously it always takes time to implement it correctly and put it in practice okay thanks Pedro let's thank the speaker again all right and we're moving out to the second talk of the session it's the Marvelous Universe of arithmetization-oriented primitives and Tomer is giving a talk Tomer thank you so this is a joint work with Abdel Ali, Elybens Hasson, Simon de Hoche and Alan Schepiniak and before we begin some background about even a half ago I spoke to Elybens Hasson from Starkwell who told me how magnificent the ZK Stark proof system is and how it is the future of cryptocurrencies but he also told me that they have a bottleneck in the form of not being able to find an efficient hash function for what they need to do and I find that to be surprising because you know as a cryptographer you always hear about lightweight this and lightweight that so amongst all this lightweightness that has to be something that would be efficient for them but then Ely told me that they have a slightly different design goals so what they need is a hash function that is secure that operates on the field elements and that minimizes the number of field multiplications and I then immediately cried out or AS because AS is secure it natively operates on elements in GF2 to D8 which is a fancy way to say that it works on bytes it is well understood heavily cryptanalysed and widely accepted but as Ely pointed out it doesn't minimize the number of field multiplications but it is a good starting point though and that is what we did and I think we all know roughly how the AS works like it has a four by four state and each round has four operations starting with an S box then shift rows, mix columns and an add round key and it's interesting to observe that all the multiplications the bed objects that we're trying to get rid of are limited to the S box so if you want to optimize the S box is where we should begin the S box itself consists of two operations first we find the multiplicative inverse of the input the multiplicative inverse is the value that when you multiply to the input they give you one and that requires 11 multiplications followed by an affine polynomial that requires seven multiplications so let's crunch the numbers and see the cost per S box we need 11 plus 7 which are 18 multiplications in each round we call the S box 16 times so we have 288 multiplications per round and then to evaluate one AS 128 primitive we have 10 rounds which are 2880 multiplications and that's the number we are optimizing with respect to the first observation is that zero zero knowledge systems like the ones we are considering allow for something called non-determinism we call it non-procedural computation which is a wider umbrella term because it has analogs in MPC so together it's non-procedural computation but going back to zero knowledge what we do is a verification of a computation and not the computation itself so for example if we have a pair of values x1 and y1 and we want to verify that y1 is the multiplicative inverse of x1 the naive way would be to directly raise x1 to the power 254 in AS in the field of AS those are the 11 multiplications I mentioned before but we can also just multiply x1 and y1 and see that the result equals 1 that's what it means to be a multiplicative inverse so one of those requires 11 multiplications the other requires one multiplication I'm gonna let you decide which one is more efficient although one multiplication is not enough we also need to ensure that zero goes to zero but in total we have two multiplications instead of 11 so crunching the numbers again we had 82800 and 80 multiplications and now the S-box only requires nine multiplications times 16 S-boxes per round times 10 rounds 1440 multiplications per AS evaluation which are 50% of the original cost of the AS so we're doing well in optimizing let's continue with that we're now optimizing the affine polynomial the purpose of this affine polynomial is to ensure that when the ciphertext is read is described as a function of the plaintext and the key the algebraic degree of this function is high enough otherwise certain attacks are possible AS uses an affine polynomial that is good for AS but we realize that if we use particular polynomials such that the exponent is always a power of 2 we could get a more efficient computation of this polynomial such polynomials are called linearized polynomials so if we take a low-degree polynomial we would have a low-degree linearized polynomial it would have efficient computation but it would be of low degree what we also realized is that the inverse of such polynomial is not linearized nor it is low degree but due to non-procedural computation the thing that I've mentioned before we can still evaluate it efficiently so what we did is to take two linearized polynomials of degree 4 and we composed one of them with the inverse of the other the resultant polynomial is of high degree it's not linearized which means that it can't be officially efficiently computed but it can be efficiently verified using only four multiplications two from each polynomial so again crunching the numbers we have nine oh sorry before we had 1440 multiplications per AS 128 evaluation now instead of nine multiplications the S-box is only six multiplications in total giving us 960 multiplications which are 33% of the original cost of AS then our next observation is that the cost of multiplication is independent of the field size that is not the case for traditional ciphers right if you work with a larger field you would you will need a larger chip it will consume more RAM in your machine on your machine but that's not the case here one more the cost of one multiplication is one so what if instead of working with a four by four state we work with a one by one state now we don't need shift rows in mixed columns anymore because the purpose of these operations is to mix the field elements of the state in if you have only one field element there's nothing to mix and we can afford one S-box per round giving us a cost instead of 960 only 60 multiplications which are two percent of the original cost of AS 128 the last thing right these are very small numbers so we still need to determine the number of rounds and the usual trick for that in symmetric design is to try all the attacks you're familiar with see which one reaches the largest number of rounds then add some safety margin and that's your cypher and we applied the security argument of a yes to this design and found out that we need about the same number of rounds so the resulting algorithm is called Jarvis and it's exactly the way I described it the input goes into the multiplicative inverse then through a polynomial that is a composition of two linearized low degree polynomials one directly and the other by in the inverse form and then key injection and that's one round of Jarvis and we put it on a print hoping that third parties would crypt analyze this and unfortunately they did and a few sometimes later a paper was published attacking Jarvis using an attack called Grobner basis now this is ironic because I remember speaking to Ellie before we publish Jarvis and he asked me specifically but what about Grobner basis attacks and I said no these never work on the symmetric II primitives well turns out sometimes they do so back to the drawing board we need to some somehow fix this it turns out that there's a minimum number of multiplications that if you go below that your cypher will be vulnerable to Grobner basis attacks another thing we realized is that if instead is that if we have a one by one state it's easy to evaluate the cypher but getting the output would be inefficient so we need more field elements and having more field elements in your state solves both problems the natural thing to do would be to go back to an M by M state or maybe an M by N state like in rainbow but we also saw that having an M by one state we can get a faster diffusion than AS now we know this is called a shark structure but we weren't aware of that back then since we now also have more than one field element we somehow need to mix them so an MDS is the natural candidate and we understood that composing the two polynomials isn't as efficient as just alternating between them which is what we did and of course special attention to Grobner basis attacks this time and the result is vision our algorithm for binary fields and that's how it works it has a vector of state elements each element goes into the multiplicative inverse then the inverse of a fine linearized sorry a low degree linearized polynomial is evaluated everything is mixed with an MDS key injection and then in the second step in the round again multiplicative inverse now the polynomial is evaluated directly MDS and key injection while we were working on vision we were told that actually some market demand for the same kind of algorithm operating on prime fields so we took the same approach but now linearized low degree polynomials are not available because of the properties of the field and we understood that we have to get the algebraic degree via the nonlinear operation again we're using non-procedural computation in raising to the power alpha and in most cases alpha should be free but the high algebraic degree we get from the functional inverse so the cubic root of the value and that's rescue our design from for prime fields where it has the same kind of state one vector of field elements and we start by getting the inverse of the power map so the cubic root if alpha equals three then everything is mixed with an MDS key injection raising to the power alpha MDS again and another key injection and this is rescue these two algorithms are secure and very efficient they're very efficient in zero knowledge in MPC and they have an interesting property so M is flexible how many field elements you will have in the state the larger M the less rounds you need so the number of multiplication remains roughly the same no matter how wide your field is that's very efficient for fully homomorphic encryption which requires that the circuit be shallow and I was about to invite you to do cryptanalysis on this but I also got the permission to break some interesting news sometimes soon it will be announced so stark where Alice company hired a committee of experts to evaluate the security of all algorithms in this domain including the ones that the military presented yesterday and this committee I was asked to be very specific on this will recommend will say that vision and rescue are the most secure algorithms in this domain when used as hash functions with fields of 128 bit in sponge mode which is the setting that ZK Starks walk in so that's something I'm happy to tell you today some housekeeping announcement in the on Monday Roberto Avanzi opened the parenthesis and forgot to close it and I feel it in my body so here's the close in parenthesis and this is the team that walked on this paper which you can fight on it print thank you very much all right so we have time for one question so there's a lot of efforts also to do similar work in the setting of secure multiparty computation so maybe you would like to draw some connections or point to differences similarities so in our paper we also evaluate the ciphers in a setting of multi-party computation the main idea like the biggest problem is are the high algebraic degree operations so the inverse the cubic root but multi-party computation offers masking right you can shift things to the offline phase and then have a fixed cost online phase if you do that the ciphers are extremely secure extremely efficient also for multi-party computation I'm also encouraging you to use them in your multi-party computation applications all right okay so let's thanks Tomer okay and now that will be the last talk of this session detecting money laundering activities via MPC network flow analysis and the talk will be given by Dimitar get well we just disappeared it's coming back I think okay all right let's go ahead since it's the last talk I would like to take this opportunity first of all to thank all the organizers for this stimulating event this week so thanks so this talk is about money laundering and I hope that this slide will convince you that money laundering is just much more sophisticated than knowing what what soap you should use to do it so how did it all started started with this global what's called global laundromat so that that was a money laundering scheme one of the biggest known currently that occurred between 2011 and 2014 and here we're talking about more than 40 billion dollars that were laundered from about 20 different banks in Russia to various accounts as you can see here there are about 5,000 companies involved in the whole in the whole scheme there are more than 700 banks in 96 different countries so that's a really global global money laundering scheme and the idea is that money has flown from across these accounts until they essentially became untraceable so if you look at the statistics then you would see that a huge part of that that amount went through through the UK and that was concerning about the financial authorities in the UK and as you can see recently the financial conduct authority or FCA organized a text print 2019 text print a lot of the people who I see in these audience participated at that text print that was in the summer in July and the goal was to design to use privacy enhancing technologies to detect these global money laundering schemes so as you can see in for the company we work for participated together with Goldman Sachs and standard chapter in a team called secret computers and one the public's voter award for that so today I would like to tell you a little bit about the cryptographic solution behind that scheme so as an overview our goal would be to solve this simple problem so detect money laundering groups across globally across multiple banks and before we do that let's let's try to see why this is relevant so clearly in this global laundromat a lot of these banks that were involved were imposed huge fines for not being able to detect these activities not that they didn't want to detect the activities the problem was that it was impossible to detect the activities because they didn't have access to the transaction data coming from the different banks from the other banks in the scheme so they had a very local view localized view of the transaction of the transaction going on through the scheme but they didn't have a global view of that so the other problem was that transaction monitoring was still a highly manual process so it needs automation even on the level of an individual bank and then most importantly and most relevant for us as cryptographers is the collaboration across different banks so collaboration is hard due to that data price so in this in this talk I would like to just take some time to review what exists in the plain text literature in the plain text algorithms for statistical versus graph-based approaches to detecting money laundering schemes and then tell you a little bit about how you could use secure multi-party computation to detect certain relevant money laundering substructures in graphs of transaction network and at the end I would say a word about how you can make this algorithm practical so if you want to look at the overview of the literature so you have you have different statistical models that could be used to detect money laundering activities starting from Bayesian models from temporal sequence matching so going into decision trees so ID3 and C45 those were the early decision tree algorithms and those were applied with some Chinese data data in China CBRC is the equivalent of FCA in China and these algorithms work relatively well except that they well they worked well as a first attempt but the false positive rates were still quite high so then people started using more sophisticated machine learning techniques like support vector machines so they tested it on certain banking data coming transaction data coming from Chinese banks as well here you can see some data 1.2 million records 5000 accounts over seven months and that obviously improved a little bit the existing decision tree models and at the end people also used neural network approaches so what I am going to talk about today is a non machine learning approach to that problem so that you can see that we heard a lot of talks about MPC and machine learning so here we want to focus on non machine learning approaches and these approaches are actually graph based approaches so what you do typically is is a bank you analyze your transaction data locally you detect some suspicious activity and then you construct a graph out of this activity and then your goal is to analyze this graph for money laundering groups that's what we are going to do except that we will consider the privacy concerns so let's take a look at a typical banking data that's a very simplified view of a banking data so we have a bank and the bank sees certain transactions so this bank here is called ZZ and you would see three transactions transaction 15 and transaction 2314 if you look at those what's common is that the source ID of the account of the second transaction matches the destination idea of the first transaction moreover what you see is that the amounts are large and the first amount is slightly more than the second amount also if you look at the time stamp you see that the first amount the first transaction was done a little bit before a day before the second transaction so that's suspicious and that the bank ZZ could detect essentially out of its local view of the data so what the bank does is the bank defines certain rule and that's really how banks work so we had an interaction with banks at this competition and they explained to us exactly how they work so you're seeing that now we have this this kind of pair of matching transactions going from YY to ZZ these are different banks and then money flowing out of this account in ZZ into another third bank so this bank sees this kind of leg structure so we call this a leg structure and there all these rules that are defining what it means for the transaction to be a leg so a bank creates all these leg structures and then the idea is what do you do with them obviously this information is not sufficient so you construct a graph and our graph will be the following so that the vertices will be of the graph will be accounts and then the edges are going to come from leg structures or suspicious transactions so we are going to increase the weight of an edge if so if first of all if a vertex is not marked and we see a leg with a vertex then we add we mark that that vertex and we add an edge and any time we see that same transaction from the two accounts we increase by one the edge so basically if you have two accounts among which you have many fraudulent transactions then that edge will have a higher weight so we define the weight of an edge that's simple and then what's the goal so the goal is that we would like to to detect certain structures that are related to money laundering so there are two simple cases that you can see here one of them is money flowing from a source to a destination account with lots of intermediaries in the middle and then you have a second case where money flows from a source to a destination but through various intermediaries in a linear fashion so in the first case you're separating the funds from the source account and they flow in parallel through several of these accounts the question is can you identify these structures so you have case one and case two and of course there is a mathematical way to quantify how you could you could measure these structures obviously a single bank is not going to be able to see these structures so that's the important point now let's take a really simple example so we take a source and destination on the two ends and then we have two banks a red and a green back so what you do is you you have money that gets split between the red and the green and then that get that flows into the destination account double so what you observe here is that if these are intermediary notes in a money laundering scheme first of all the traffic through these notes is very similar and then the traffic is large because you are talking about large amounts of money so if you if you draw the adjacency matrix matrices of these graphs then the red bank is going to see a matrix that looks like this here so I guess nothing can be detected from that that particular matrix now similarly the green bank is going to see that matrix now what do you observe if you put together this information so if you put together the information then you're seeing that the two rows corresponding to y and z are correlated highly correlated so they're similar in this case but in general they will have a very high correlation so is it possible that we detect this correlation without actually sharing the transactions of the banks and the answer is yes so we can we can use MPC here the basic idea is that we take these two graph views local graph views and then we secret share them across the two two banks and then we let these banks compute now what what is it that the banks have to compute in order to detect these activities so here is what there are two types of scores that you could compute and the first score is called a balance score so the idea is very simple so note is going to be an intermediary note first of all if it has higher than usual traffic treat and second of all if the number of income intersect transactions is similar to the number of outgoing transactions so this score that you see there it's a complicated mathematical formula but what it tells you is that it has two terms one of them detecting the first intuition and then the second one is detecting the second intuition so the top level is a harmonic mean which becomes higher if you have almost equal incoming and outgoing transactions and down there you are measuring the size of the sum of the ways going through these accounts so that's one possibility and then the other one which is even more interesting for our application is something called structural similarity so here what you do is you're going to essentially quantify the by using a similarity or correlation between two vertices so you're going to quantify what it means that the incoming transactions are equal to the outgoing transactions so that's exactly what you do by again considering something similar to a geometric mean arithmetic geometric mean so here is what you do the plain text is version now of the algorithm is very simple so you compute the similarities on the graph and then you test whether some of the some of these correlations are higher than a threshold so that's what people would do in plain text and then you determine sets of dense pairs and then after a standard clustering algorithm you're detecting you're detecting chemical groups so of course this requires a lot of operations as you can see inversions and squares that makes it that makes the computation highly inefficient so what you do is we propose an MPC version that only uses a certain small set of operations so what we do is we compute the numerators and the denominators and then we test essentially inequalities without inverting and without square roots so we're going to square and and not not not having to compute square roots and essentially do the same so that's a that's a very similar adaptation and so now we are going to need these three operations additions multiplications and what we call oblivious comparisons so we are not allowed to expose these these correlations yet so that's why we need to believe as comparisons so how do we do that we do this with essentially some of infras technology that we developed so it's a secret computing multi-party computation algorithm it works with real numbers that are represented as integers module a large power of two it's suitable for a arithmetic circuits and it's not very suitable for inversions and I would say advanced linear algebra so the security model is what we call the full threshold with offline and online face and trusted dealer or honest but curious dealer and it can potentially be improved later on with verifiability include using techniques like oblivious transfer or cutting choose techniques in multi-party computation so that's on a very high level what's important is that you have this oblivious comparison algorithm and that works as an input you take two numbers that are L bit numbers and that are secret share them as an output you get the secret share predicate of the of the comparison so there are two algorithms there's a naive and there's a divide and conquer algorithm and the naive has the advantage that sometimes it's faster it has less communication but the divide and conquer has less rounds of communication so for certain ranges the divide and conquer is the better algorithm that we use so I won't say much more about this I'd rather give you some real data to see what kind of sizes we're talking about so in the text print competition we get we got data coming out of three thousand three hundred and thirty thousand accounts and about one point five billion transactions so that's a giant graph as you can imagine so you can't handle it you have to do something else so then then about three hundred K of these transactions were alerted and then some of them were reported as suspicious activity report and then some of these accounts are just were just blacklisted by the banks at the end so here's the idea of how to make it practical so of course we can't talk about adjacency matrices that are three hundred thousand by three hundred thousand so what you do is you use a data structure similar to a bloom filter you're going to choose several hash functions that are going to hash the number of accounts into a very much smaller set so let's say you would hash it into a set of size a thousand and then you're going to basically compress your graph using these hash functions and constrict graphs on the output of these hash functions so then then you essentially test test similarity for each of these hash functions independently and using that you could you could conclude that that two accounts are going to be similar so that of course has a probability as for false positives which you have to calculate precisely that's an open ongoing project but in practice that works quite quite well so some estimates on the complexities so the beaver multiplications here that are that are used for the multiplications in the computation of the similarities you can see very large dimension so we can see very large vectors and correlations that are computed so this is all this all can be made practical that's the that's the final point oblivious comparisons we can make about one million comparisons of healthy 50 bit numbers in less than two seconds and about a hundred million comparisons of the same size for less than eight minutes so you could you could definitely do these computations in reasonable time and now in summary what we got so just to summarize so we have provided a graph based approach to allow interbank collaboration on analyzing transaction data so we have had MPC friend the algorithm to compute structural similarities oblivious comparisons to detect dense pairs of account and using standard clustering algorithms to detect the structures that we're interested in so as open questions of course the one of the main mathematical open questions here is understanding this data structure that's behind so understanding the probabilities of false positives which even in the case of bloom filters is a non-trivial problem and then also enable verifiability via more sophisticated cryptographic techniques to allow different security models so I would like to thank you here also this is the team who worked on that alright so we do have time for questions for Demeter so please walk to the mic hello thank you given that banks such as HSBC were involved in money laundering themselves to what extent are your techniques resilient against the bank itself conducting money laundering operations so if a bank itself conducts a money laundering that's a different question so the assumption here in the text print competition was that banks get fined for not being able to detect properly money laundering activities in general if a bank is malicious you obviously can't use it in the graph analysis so you have to use banks that are kind of motivated by the same goal which is it's game theoretic win-win situations so the banks want to collaborate they don't want to share the transaction data they want to be able to detect the amount activities and you trust Goldman Sachs so well Goldman Sachs we're interested in not being able to detect activities because they would get so if you look at this global scheme just as a matter of fact they were banks in in Latvia and Moldova that were actually revoked licenses so these banks no longer existed after this it's a like multi-billion dollar scheme understood thank you I wanted to ask you about the numbers you had how many parties were involved in the computation where were they located how what was how did your benchmark how did you get the numbers you have in the slides so how did I get the numbers of the the number of banks that are involved in the in the computations that the timings the timings you have on the slides or the timings in the slides so you mean here okay so I should have said that yeah I should have said that that's for two in three parties that's a small number of parties in this complexity analysis coming before you have the number of parties appearing in the complexity so so and that that can obviously be improved in the future with that with better techniques but case the number of parties okay in the oblivious comparisons thanks yeah so it depends hey thanks for the talk I wonder if you have to do some sort of fuzzy matching when it comes to for example name matching when it comes to for example when it comes to match names or addresses or those inputs that are you know fuzzy in nature so I see so here here we are we have not looked at that I think there are some people who were discussing that during the competition we ourselves we have not looked our assumption was always that essentially the account has a very specific identifier and and then you use that I think I think there are there is some research that might be even going on now on that I'm not sure exactly about who is doing that but I have heard about fuzzy matching and money laundering I see yeah because one of the challenges for example to determine if a person is on the sanction list is actually to do it in a fuzzy matching exactly exactly exactly I think I have seen papers on that so I have seen papers on that in the literature thanks thanks the talk do you have any insights on what happens as money launders sort of evolve to avoid the detection techniques so for example you have that diamond flow and of course the obvious thing is okay route it through two intermediate banks yes yes for rather than two so like what what changes that's an excellent question so so we are motivated by that so here this is the very basic structure that you detect so you're not detecting all possible structures of money laundering with that yeah so you are at least you're having some measures for what what it means for a note to be an intermediary note but in fact currently I'm even involved in some some projects that are that are non-NPC related that are using other techniques other secret sharing techniques to detect such such structures that are more complicated that have two hops instead of one so does the flow analysis you're talking about handle clusters of nodes that act like one node that is definitely doing money laundering but individually none of the nodes looks suspicious individually none of the nodes looks suspicious and individually these nodes they are accounts at a particular bank so from the point of view of the bank that doesn't look suspicious but sorry just clarify I mean like suppose you have two nodes yes that are collectively operating like a middle man doing laundering yes but the individual transactions don't follow the leg structure they're talking about only when you look at them combined to the leg structures appear can you handle that in this current so with this with this technique you will be able to handle that you may have to modify a bit your local rules that you apply it each bank as you see this this kind of algorithm is very specific to the local rules that are applied to each bank we learn these rules just by talking to banks so essentially we learned about what they are using nowadays and that doesn't mean that they cannot enhance or like improve these rules thanks you're welcome well thanks for the talk it's pretty interesting I'm wondering you sort of seem to have very ad hoc clustering rules and they have a complicated mathematical structure for it for so we have to do a lot of operations yes I'm wondering if you did a special method then you just need to compute the eigenvectors and eigenvalues of this matrix and that you can that the main method is multiplication and the banks can even keep the keep the matrix secret and just have shares of the vector and the main thing is linear yes so that that is that is a very good question we have not tried to apply spectral methods so typically spectral methods they're useful for clustering algorithms so we have not been a we have not tried to that doesn't mean that you can't I think you you you probably can can be able to use even NPC to do to deduce some information so currently our algorithms were just based on these structural similarities but spectral methods are definitely helpful so that's a that's a that's an area to exploit in the future thanks okay so let's thank dimter again alright and this brings us to the end of the technical program so I'd like us to give a big round of applause to all the speakers and the 36 sponsors that with without their contribution would have been impossible like to run real world crypto and now the final point I would like a roaring round of applause for Toma Riston Park the general chair this year thanks very much it's actually quite a pleasure to help organize events one of my favorite events for the year just a few announcements and I'll let you go to happy hours and other Friday afternoon activities so next year real world crypto will be in Amsterdam so Amsterdam 2021 the year after that we'll be moving to Tokyo for real world crypto 2022 and after that we're soliciting proposals for locations so if you have a location in mind that you'd like RWC to come to then you should reach out to one of the steering committee members and get some info about how to proceed we sent out a survey to the attendees it's very short it shouldn't take much time but it's really valuable to get feedback from all of you about what we're doing well obviously but also more valuable to get some feedback what we're not doing well and we really take this to heart to try to make sure that this is serving all of you super well another way to get involved in the community you're all now members of the ICR the National Association of Cryptologic Research and that means you'll be eligible to vote in future elections for the leadership for our nonprofit organization that runs cryptographic you know academic cryptographic research conferences and other activities so be on the lookout for for that finally just a few more thanks also thanking the sponsors that we have been putting up here really helps make sure that we can run this and keep it cheap and accessible for people to attend so thank you to them and then particularly we had a special recognition that a person one of our 2019 Levchin award winners Eric Rascorla personally made a donation to support particularly students for women and other underrepresented minorities be able to attend so I think we should give Eric a big round of applause and then finally I think we should thank especially the specifically the Columbia University Data Science Institute who helped facilitate and host us here at a beautiful venue Columbia University the student volunteers who sat out in the very cold lobby to make sure that there are people available to answer questions and deal with any issues that came up and then the Columbia event staff who I think did a phenomenal job of running a very large conference and making sure things ran smoothly so let's give all of them a big round of applause and that's it so have safe travels and we'll see you in Amsterdam in 2021. Thank you.