 Hello everyone, my name is Gajendra Reshpande, working as CEO of ISAC Cyber Security Solutions. Also a faculty member at Angadir Institute of Technology and Management India. So today I will be presenting a talk on Fighting Money Laundering with Python and open source software. So in today's talk, I will be discussing in brief about the forensic investigation process, then the Daubert Standard and Python, how they are related, then the core part of the talk that is bank account analysis, which is also known as fund trade analysis. So it basically has three main aspects. One is the Benfort's law, then the visualization and finally the logging module. So let's discuss about the forensic investigation process. The forensic investigation process has got six steps. So it starts with identification, collection, validation, examination, preservation and presentation. In the first step that is identification, the investigation officer visits the crime location for identification of the different various objects. It may be electronic devices or it may not be electronic devices. So they have to collect the electronic devices basically, especially for 7, 4 and 6 process. Sometimes evidence may be present in toy devices like toy pen drives. So they need to carefully identify such devices and take them to their custody. And when they are identifying the evidence, they need to also identify the electronic devices such as laptops, mobile phones, tablets and etc. Now once they identify the different objects, the evidence collection process begins. Now in this case, when they are collecting the electronic devices such as laptops or mobile phones or tablets, they need to preserve their state. For example, if they are on, they should not turn them off. If they are off, they should not turn them on because some information, some evidence may be present in the volatile memory. So if they change the state of the system, then the information may get modified or it may be deleted. So when they are collecting the evidence, they need to collect the most volatile information first and least information last. So after collecting the information, they need to validate it. So in validation process, what happens is that they are going to validate that the evidence is valid and it's not tampered. So during the examination process, the investigation officers will work on the copy of the data and not on the original data. So they use something called as hashing algorithms for integrity check. Then after examination, the evidence should be preserved at safe place. Then the last step is presentation. So if you have followed the procedure set by the law enforcement agencies and the court, then the evidence is acceptable in the court. Otherwise court will not consider it as a valid evidence. Now let us discuss about Robert standard and how it is related to Python. Now in United States federal law, the Robert standard is a rule of evidence regarding the admissibility of expert witness testimony. A party may raise a Robert motion, a special motion in layman raised before or during trial to exclude the presentation of unqualified evidence to the jury. There are some illustrative factors. The court defined the scientific methodology as the process of formulating hypothesis and then conducting experiments to prove or falsify the hypothesis and provided a set of illustrative factors. Now these illustrative factors are has the technique been tested in the actual conditions and not just in a laboratory has the technique been subject to peer review and publication. What is the known or potential rate of error? Do standards exist for the control of the techniques operation? Has the technique been generally accepted within the relevant scientific community? Now in 2003, Brian Carrier published a paper that examined rules of evidence standards including Robert and compared and contrasted the open source and closed source forensic tools. So one of his key conclusions was using the guidelines of Robert tests. We have shown that open source tools may more clearly and comprehensively meet the guidelines requirements than would close source. So Python is an open source tool so it is clearly evident here that the Python follows or it adds to the guidelines of the Robert tests. Then the results are not automatic of course just because the source is open. Rather specific steps must be followed regarding design development and validation. Now can the programmer algorithm be explained? This explanation should be explained in words not only in code because you are explaining the code to legal people and they may not understand the code. So has enough information be provided such that thorough tests can be developed to test the program? Have error rates been calculated and validated independently? Has the program been studied and peer reviewed? Has the program been generally accepted by the community? So you can see here that there is one to one mapping between Robert standards illustrative factors and also the Python. So Python is one of the suitable languages for forensic investigation and also forensic investigation is highly scientific process and Python is a scientific programming language. So in that view also Python is one of the best suitable languages for forensic investigation. So now let us discuss the problem that is the bank account or fund analysis. So this is the problem statement given by or given by the police department both at the state and the national level. So the most common and the continuing offense in recent times is that of money laundering to make it easier for the police to track the illegal origin of money and undo the complexities created in a series of bank transfers. The police department is looking for a solution that facilitates efficient analysis of bank accounts to identify the links among them. So this is the problem statement. Now let us look at what are the examples of money laundering. So in trade based laundering, it involves manipulating invoices to disguise the movement of money. So for instance, a person might export goods of worth $10,000 and produce inflated invoices worth $20,000. Then the extra $10,000 is claimed as the clean money. The next is structuring is also known as smurfing. So this involves breaking down large amounts of cash into smaller, less suspicious transactions. So these transactions are usually spread out over time and possibly across different financial institutions to avoid detection. So in our solution, we are focusing on the structuring problem. The next is offshore accounts. So money is deposited into offshore accounts that offer greater secrecy and are less regulated. The money might be moved around several times, making it difficult to trace its origins. Then shell companies and trusts. So criminals create fake companies and trusts to disguise the true origin of the money. So these companies might exist on paper, but they don't engage in any legitimate business activities. The next one is gambling. So here the individual might buy cash in your chips with illicit money and then cash them out later for clean money. So alternatively, they might intentionally lose bets to another party who is in on the scheme and Bitcoin and cryptocurrencies. So with the rise of digital currencies, criminals have turned to cryptocurrencies as another way to plunder money due to its anonymous nature. Then overseas real estate. So investing dirty money into real estate is another common tactic. The property is bought with illicit money then sold and the proceeds are considered clean. Then there's one more important aspect that is political exposed persons also called as peps. So political exposed persons, these are the individuals who are or have been interested with the prominent public functions. For example, heads of state or government, senior politicians, senior government, judicial or military officials, senior executives of state owned corporations, important political party officials. Then international organization peps. So persons who are or have been interested with the prominent function by an international organization refers to members of senior management or individuals who have been interested with equivalent functions. For example, directors, deputy directors and members of the board or equivalent functions. Then family members and close associates. So family members are the individuals who are related to pep either directly or through marriage or similar forms of partnership and close associates are individuals who are closely connected to pep either socially or professionally. There is a nonprofit organization known as the Financial Action Task Force. It was established in 1989 and is based in Paris. So this task force leads global action to tackle money laundering, terrorists and proliferation financing. It researches how money is laundered and terrorism is funded, promotes global standards to mitigate the risks and assesses whether countries are taking effective action. So in total, more than 200 countries and jurisdictions have committed to implement the task force standards as part of coordinated global response to prevent the organized crime, corruption and terrorism. So the task force decision making body, the FITF plenary meets three times per year and holds countries to account if they do not comply with the standards. So you can visit the following link to understand more about the financial action task force. So as I have said, we are dealing with the structuring problem. So the structuring problem has got three main layers. One is the placement layer, second one is layering and third one is integration. In the first step, the sender divides the large amount into smaller amounts and uses many bank accounts to transfer the money to the receiver. So here you can see here that there are arrows shown in red color and arrows shown in green color. So arrows shown in red color may be the possibility of money laundering scenario, whereas the green one is not a money laundering scenario. The solution is that in the first step, the investigation officers what they do is they collect the bank statements of suspects from different banks. A person may have account in multiple banks or there may be multiple people involved here. So once they collect the bank statements, the data set of accounts is created. Now one check what you can use here is that you can use the Benford's law on these account statements and see whether the distribution, the transaction amount follows the Benford's law. So if it follows the Benford's law, then it may not indicate that there is money laundering. So there is just one indication, but you need not have to stop there and you can continue your investigation process. Then after this you import the data of suspicious accounts. So basically the bank statements will be of different formats. So each bank has its own format. So getting them to uniform format is a challenge. So once that is done, you also need to import the data of PEP, RCA and sanctions data. Then in the next step, generate a graph showing links between accounts and transactions. Then apply machine learning here on the above graph and try to predict the possible money laundering case. Then also you need to generate visualization where you are to show the chain of accounts with fraudulent transaction. Since it is a forensic software, every step should be recorded for verification and validation. Because if your opposition lawyer challenges your evidence, then he or she can follow the same steps and get the same data. So let's discuss about Benford's law. So it's a phenomenological law also called as the first digit law or leading digit law where the first digit phenomenon or leading digit phenomena. The Benford's law states that in listings, tables of statistics, etc., the digit 1 tends to occur with the probability of 30%. So it is much greater than the default expected probability that is 11.1%. For example, if you divide 100 by 9, you are going to get 11.111 and so on. So that is the equal distribution. But that's not the case with Benford's law. So Benford's law has identified distribution of each digit. So that we will see it in the next slide. So Benford's law was used by a character Charlie Epps as an analogy to help solve a series of high burglars in season 2, the Running Man episode 2006 of television crime drama numbers. Now this is the formula that is P of D is equal to log to the base 10 of 1 plus 1 by D. So where D is equal to values of the leading digits from 1 to 9. Now you can see in the table that the distribution is mentioned. So digit 1 can occur around 30% of the times. 2 can occur around 17%, 3 can occur 12% and 4 can occur 9% and so on. So finally the digit 9, it should occur around 4%. Now when we are using the Benford's law, we need to check whether the transaction history follows the Benford's law's distribution. There can be some error rate which we need to consider. There can be some small deviation because it is not possible to get the exact values. But that error rate is acceptable. But if there is too much deviation, then there is a possibility of money laundering. It may not be applicable to all the digits even if there is a single digit which is not following the Benford's distribution, then there is a possible case of money laundering. So now let's discuss hash functions. So we had discussed something called as validation and we had discussed that the examiners, they work on the copy of the data. Now we have to make sure that the copy of the data is not altered, maybe by computers or maybe manually. So in that case, we can use the hash functions to check the integrity. Say for example, you can see here that we have a statement. We have created a message. Python is a great programming language. So that is a message and there is a message X. So in both the message, we have the same text. And when we compare the digest of message M and message X, we have got a value true. That means the information is not modified. If we get value false, then that means the information is modified. So in this case, I have added extra space at the end of message X. So that's why I'm getting the value false. So that means the integrity is compromised. Now let's see the source code. So how we can write the code in Python. So you can import pandas library, file library and matplotlib. So you can create a list when for blog and specify all the distributions. So you can see here we have specified for first digit is 30.1. Then for second digit is 17.6 and so on. Then we are using a data frame where we are importing the bank statements. So after that we are extracting the first digit of every transaction. Then we are calculating the actual distribution and we are using the matplotlib library to draw both actual distribution and Benford distribution. Then we are calculating the deviation because that is very, very important for us. So deviation is more. So in this case, we are assuming a number of 15. Then we can say that there is potential for detected. Otherwise you can say that there is no potential for. Now in Python, there is a package called as Benford's law. So you can install it and you can directly use it. So it's very straightforward to use. So install Benford's law and import Benford's law can also import Nampa and pandas package and use pandas package to read the statement which is in CSV format. Then initialize the alpha and select the method as G square and set the value of alpha as 0.05. Then you have to fit the results on the graph. So similarly you can also test for second digit and third digit and plot them on the graph. Now you can see here that this shows that anomaly is detected because the red dots indicate the Benford distribution whereas the gray bars they represent the empirical distribution. And you can see here that the number of digits which are occurring here. So it's 35% whereas the threshold is 30%. So there is a lot of difference. So in this way you can conclude that there is a possible money laundering. Now if you consider the second digit then it's not at all following the Benford's distribution and there is a lot of deviation compared to the first digit similar to the third digit. But only on first digit you can conclude that there may be money laundering or there may not be money laundering. Now next is the visualization. So here to visualize the transactions. We need to make use of package called as networkX and matplotlib library. So load all the transactions and initialize the directed graph using digraph method. Then add edges to the graph but these edges should be weighted by the transaction amount. So that we will see in the graph. Then draw the graph. So basically here we have to draw a directed graph. For example you can see here the output of the previous code. So there are vertices and edges. So each vertex is mapped to a account number and the transaction between different accounts is shown with the help of edges. Now you can see here that some edges are dark, some edges are light. So that shows the amount of transaction. It is based on the amount of transaction. If the amount of transaction is more you get dark edge. If the amount of transaction is less you get light shade. Then next is the locking module. As I've said it's a forensic software and we have to record each and every step so that we can submit it to the code showing that how we have arrived at the evidence. So the logging module in Python is a built-in module used for flexible event logging in applications and libraries. Logging is important for maintaining an audit trail of an application, understanding the flow of execution and debugging issues by examining what happened in the applications past. So here are some of the key components of Python's logging module. So one is the logger. This is the object used by the application code directly to call the functions. Then next is the log record. A log record is created by the logger every time something is logged. Then handler, so handlers send the log record to the appropriate destination such as console or a file. Then the formator, this specifies the layout of the output. Now you can see here how we are using the logging module. Apart from the previous libraries such as NetworkX, Pandas and Matplotlib, we are now going to import the logging module. So we are going to use basic config function to set the file name and set the file mode. And we are also going to set the format in which the information should be written to the log file. Now you can see here since we are loading the data from a CSV file, we are displaying a message as reading the CSV data. If there is error reading the CSV data, error reading the file, so it should display the message as error reading the CSV data. Then next again initialize the directed graph. So now we are displaying the messages initializing the directed graph. So for every activity we are basically trying to display a message so that we can record all the steps and show it to the court and we can say that this is how we have got the results. So if anyone challenges it, they can reproduce the evidence by following the method steps. Finally, conclusion. So front trial analysis is very complex, especially when things like cryptocurrency is involved and when some banks like we do not maintain any records, electronically in that case it becomes very complex, it becomes very complex when multiple accounts from multiple countries are involved. For example, foreign transaction is involved. And more advanced techniques and tools are required because Benford's law is very elementary. So we need to apply more sophisticated machine learning techniques to predict the possible money laundering. And there may be possibilities of false positives. So those needs to be investigated very carefully. Thank you everyone for attending my talk.