 Hello, welcome back after the lunch break So in the next session, we're going to hear from gajendra. I hope I pronounced the name correctly About building your first cyber forensic application using Python So that sounds like a very interesting topic. I hope we will learn a lot and Welcome everyone gajendra I'm right. I'm audible Yes, where you're streaming from? I'm streaming from India That is southern part of India. So I just came one hour early from my office Nice, okay Excellent. So you have already set up everything the screen share so we can put that on Yeah, the second for the technician to Yeah, there you go. Okay, excellent. So gajendra, I'm gonna leave the stage now I'm going to monitor the chat and record all the questions and then five minutes before the end of the The session we can have a Q&A. Okay. Take away. Thank you. Thank you So, hello everyone, my name is gajendra deshpande and today I will be presenting a talk on build your first cyber forensic application using Python So in today's talk, I'm going to briefly discuss about introduction to digital crimes digital forensics the process of investigation the collection of evidence Then setting up Python for forensic application development built-in functions and models for forensic tasks Forensic indexing and searching hash functions for forensics forensic evidence extraction Then metadata forensics and in brief using natural language tools in forensics Now, let us first look at some statistics related to cyber crimes The internet crime report for 2019 released by USA's internet crime complaint Central IC3 of Federal Bureau of Investigation has revealed top four countries that are victims of internet crimes So USA has reported more than 60 more than four lakh 60,000 crimes Cyber crimes UK more than 90,000 Canada more than 33,000 India more than 27,000 Of course, these numbers are only reported numbers, but unreported numbers can be much much higher then according to RSA report mobile transactions are rapidly growing and cyber criminals are migrating to less protected soft channels and Also according to an article published in Indian Express on 19 November 2016 over 55 percent millionaires in India are hit by the cyber crimes So that is because mobile Phone is a soft channel and many people are not aware of the different settings in the mobile phone which can Provide them the safe environment then also the recent study by checkpoint research has recorded over more than 150,000 cyber attacks every week during COVID-19 Pandemics so there has been an increase of 30% in cyber attacks compared to previous weeks Now let us first look at the definition of forensic science so forensic science is the use of Scientific methods or Expertise to investigate crimes or examine evidence that might be presented in the court of law. So cyber forensics is investigation of various Crimes happening in the Cyber space so examples of cyber attacks include fishing ransomware fake news fake medicine Extortion insider frauds and we know that during pandemic and also in digital era We are facing the we are facing a huge problem of face We are facing a huge problem of fake message circulation Then according to DFRW s that is digital forensics research workshop Digital forensics can be defined as the use of scientifically derived and proven method toward preservation collection validation identification analysis interpretation documentation and Presentation of digital evidence derived from digital sources for the purpose of Facilitating or furthering the reconstruction of events found to be criminal or helping to anticipate unauthorized actions shown to be disruptive to planned operations So if you look at this definition, there are two parts the first part speaks about the different stages in cyber forensic investigation and second part speaks about the reconstruction of events Such that the evidence can be found and the same can be presented in the court of law So as we have seen in the previous slide these are the steps in investigation process cyber forensics investigation process So first one is identification then collection validation examination preservation and presentation so in identification phase what happens an Investigation officer will visit the crime location and officer will try to Identify different objects Where the evidence may be presented these include hard drive Mobile phones or smart phones cables Smart gadgets these can also be toy gadgets which look like toys, but maybe they are also devices such as toy pen drives then the next stage is collection of these different objects So Investigation officer will collect all the office and put it in a safe bags such as Faraday bags or anti-static bags So that the evidence cannot be altered. So this is about the collection of physical evidence physical objects then if the Computer is on our laptop is on then the investigation officer has to take a snapshot of the entire system. So in that case What they will do is they will just carry out the system to the lab if the system is on then they will just Perform the life forensics if tools are available if tools are not available then they will just pull the plug so that the system status can be Retained so that is very very important if the system is on then they should not turn off the system If the system is turned off then they should not turn in turn it on because that will alter the status of the system and Some evidence may be lost Then third process is the validation. So note here that the Investigation will be performed on this snapshot or the copy of the data and Once the investigation is performed they need to ensure that the original data and the copied data are same So for that hash algorithms can be used. The next is the examination. So here Investigation officers will use different tools. There are many commercial tools. There are open source tools are also available We can also use python. We can use small scripts to perform Examination of the evidence. The next is the preservation of the evidence So the evidence needs to be preserved in appropriate environment appropriate room temperature and appropriate security needs to be provided Evidence needs to be stored in maybe a locker room Along with the appropriate temperature and also as I have mentioned evidence needs to be placed in the Anti-static bags then final stage which is very very important that is presentation of evidence in the court of law If all the procedures laid by the law enforcement agencies are followed correctly Then the evidence can be presented in the court of law And there is one important standard that is known as the Robert standard in United States Federal law the Robert standard is a rule of evidence regarding the admissibility or admissibility of Expert witness testimony. So a party may raise a Robert motion a special motion in limiting raised before or during trial To exclude the presentation of Unqualified evidence to the jury. So there are some illustrative Factors which are considered as scientific methodology So first one is has the technique been tested in actual field conditions Not just in laboratory has the technique been subject to peer review and publication What is the known or potential rate of error? Do standards exist for control of techniques? Operation has the technique been generally accepted within the relevant scientific community Then in 2003 brain carrier published a paper that examined rules of evidence standards including Robert and Compared and contrasted the open source and closed source forensic tools So one of his key conclusions was that using the guidelines of Robert test We have shown that open source tools may more clearly and comprehensively meet the guideline requirement Then would close source tools. So Python is obviously open source So it meets the Robert standard and so it can be used and it's certainly used in digital forensic process Our investigation was as a tool. So the results are not automatic Of course, just because the source is open rather specific steps must be followed regarding design and development and validation So the questions are can the program or algorithm be explained This explanation should be explained in words not only in code has enough information being provided Such that thorough tests can be developed to test the program have error rates being calculated and validated independently has the program been studied and peer reviewed has the Program being generally accepted by the community. So the source for this information is the book by Chet Hosmer On Python forensics. So you can refer that book for more information Now next thing is setting up Python for forensic application development. There are many factors So first one is your background and your organization support. So what? Qualification you are having in terms of say for example tools and the in terms of language knowledge And whether your organization supports open source development or it is interested to Invest in commercial tools if it can invest then it's fine. Otherwise, you need organization support to develop the tools Then next is choosing the third party libraries So it is bit risky because we are not sure whether those libraries are properly maintained if they are properly maintained Yes, you can use them Then ID is and their features. So we know that if you use standard ID is then they provide Very useful features as a intelligence which will help us in typing the program and we can speed up our Writing process the next is installation. So installation of our pretty system again There are many options you can install it as a standalone System or you can go for a virtual machine or even you can go for a cloud Then right version of Python. So if you are using third party libraries, this may be a problem because it may not be compatible with the Recent versions of Python. So you need to see which the third party library is compatible with which version of Python and you should start using it Then next is which kind of interface you like whether it is graphical or shell. So some people may be beginners may Love to use graphical approach, but we know that more experienced people still prefer shell because they just love command line interface and it is easy to Get output and it's also more customizable. And let us look at some built-in functions and modules in Python so note here that if you want to create your first cyber forensic application, you need not have to write any extra Code or you need not have to Use additional libraries. You can just use built-in functions available and you can start writing the code so Here on the screen you can see here that we are generating the IP addresses We are generating local IP addresses that is one twenty seven dot zero dot zero dot one two zero two nine So they are totally Ten IP addresses. We are defining the range. So range is a built-in function And you can also see that append is a built-in function, right? So print is a built-in function. So we are using just two functions range and append to generate the IP addresses And similarly It's a small code which will help us to List the files and directories in the present directory. So for that, you need to import the OS module Then use the get CWD method then use list dir method to list the directories and Just use for loop to navigate and print the files and directories in the present directory Then forensic indexing and searching so you can use simple file search and index function to search for particular keywords and To find out their location. So in forensic what happens is you are going to investigate a huge amount of information In terms of GBs and TBs, you are not going to you don't need all the data You are looking for certain evidence based on the case You are looking for certain keywords based on the case. So in that case You can just specify those words and search in the Image of the system and if you are getting those words, then fine. Otherwise It's okay. So here you can see a small code has been written. So File has been created keywords.txt some keywords have been mentioned here and we are just using a if Condition here and searching for a word python. So if python word is present Then it says that python word is found. Otherwise, it says that word is not found. It's a very simple example But you can extend the same program to Achieve your goal Now for that we have some advanced Package code as a whoosh. So it is used for forensic indexing and searching It was created and maintained by matt and it was originally created for them using online help system Of side effects software 3d animation software houdini. So again, it's a pure python library It supports fielded indexing and search fast indexing and retrieval is supported and it also supports powerful query language Now how it works is in simple terms if I have to tell then you can say that you you are building a Custom search engine. So first you are adding all the URLs to the system Then you are using the query parser and it's whoosh query parser to Find out the required information The next is hash functions for forensics. So we know that validation step is very very important for us because we are working on the copy of the data and also we are We need to ensure that the hash of The copied image and the hash of original image should be same if they are not same then that means that the Uh information has been tampered and the evidence cannot be accepted. So how we can do it So very simple example. I have included on this screen. So import hashlib library then use sharp 256 method To generate the message digest. So we are generating message digest and storing in Yum, then we are generating the second message digest and storing it in x Then we are checking whether the message digest of both x and m is same. So in this case, it is same Now on this screen, you can see here that in the second hash that is in x's hash I have just added one extra space at the end. So in this case, it says that they are not same So I'm getting false here So that means something has been tampered and forensic evidence extraction. So there are various Kinds of files and to extract the information from these kinds of files. We need to use specific packages Say for example, if you are working on images, then you may have to use below If you are working on pdf, then you may have to use by pdf if you are working on Audio file or video file. Then again, there are some packages so you here we are using a Pillow package and we are trying to extract the information of Files, so generally when we use tags, it gives us the properties of files and whenever we use gps tags It gives us the information such as latitude longitude And location of the image where it was taken So these things are important for evidence Then there is a library called as a pie screenshot. It tries to allow to take screenshots Without installing third-party libraries. It works as a wrapper for many image processing libraries and also it's Rappers available for pillow Now if you want you can take the screenshot of the entire screen here So for that you need to import the pie screenshot package Then use the grab method and use the save method to save it. So this will take the screenshot of entire screen Now you can also take the screenshot of a particular part Of a screen just part of a screen you can take so for that you have to specify the coordinates x1 y1 and x2 y2 And it will take the screenshot of that particular part Then note here that you can also work on to improve the performance of pie screenshot, but Performance is not the goal in evidence extraction. Evidence is very very important But if you want you can improve the performance by making some settings such as changing the backend and setting child process to false So then metadata forensics. So there is a library called as mutagen which can be used to extract the information from an audio file Basically, uh, so it supports various formats such as asf flak mp4 mp3 and so on so You can import mutagen library using import statement then specify the file name and Print the value and when you print the value you can See here that it says that it the type is aug verbis and the duration is 346 143 seconds and bits per second is 49821 And similarly you can specify or you can extract the information from flak file and also the from mp3 file So you can print the length and bit rate of an audio file And similarly you can extract the information from pdf file So for that you can use by pdf 2 you can extract the document information such as title author and other properties Then split the documents page by page merge the documents page by page then crop the pages Then merge multiple pages into single page Encrypt and decrypt pdf files And it is very useful tool for websites that manage or manipulate pdf Then next file type is PE file so PE file stands for portable executable file which is generally available on Windows operating system, but PE file package works on Any operating system such as windows or linux So you can extract the information of a portable executable file So it supports features such as inspecting headers analyzing of sections data retrieving embedded data Reading strings from resources warning from suspicious and malformed values and So on And next is using natural language tools So you can use nlp tools for analysis of information. We know that we are going to extract a lot of information and you can Use nlp tools to find the correlation you can use machine learning to find the correlation Between the evidence so for that there are various packages, so nlp tools are used for examining the text for evidence So there are packages such as nlp kspacy and texter see so texter see Is built on top of top of specie and offers more additional features Then if you are interested in multi-lingual Processing multi-lingual information processing, then you can use stanza. It's by stanford. It's earlier known stanford nlp Now it's known as stanza. It supports around more than 60 human languages and polyglot is also very popular nlp library which supports huge number of languages But there is still a lot of scope because all languages are not supported and all features are not supported Then if you are interested then you can also go for inltk and indik nlp if you are working on Or if your information is Written in indian languages, of course stanza and polyglot also supports indik languages, but these are the specific packages for indian languages And in summary, I can say that it is very important to follow the standard procedure led by The law enforcement agencies during investigation process If it is not followed then code will not accept the evidence Then there are many open source as well as commercial tools for digital forensics Learning to develop your own tool is always advantageous because that's an additional skill and some organizations may not be interested in investing to buy costly product Then many tools written in python are pure python Implementations and most importantly python and open source tools comply with dober standard So with this I would like to conclude my my talk and I thank europe python organizers for giving me an opportunity to Speak in this conference. Thank you So, thank you very much kajandra for the talk very interesting and We have a question for you. So, let me see this one regarding searching Uh, would you have any good advice on searches for keywords outside of one's spoken languages going beyond dropping stuff into google translate? uh That's a that's a interesting question. Maybe we may have to We it's it we know that google and other things they Are keyword based searches. Maybe we may have to go for semantic searches where we are going to define the ontology And I think that's Is going to give more better results okay, so I think using the the There are some tools for for nlp where you can you know download ontologies for different languages So maybe that's a good way to search for keywords So, um There is another question here. What are some tips for starting with nlp? uh It depends on what kind of problem they are solving but The most important thing will be cleaning the data Removing all unwanted information and keeping just the relevant information so Basic things like tokenization stemming lemmatization Named entity recognition all those concepts should be clear. They can start with those things Right Ready to that. I have a question as well because I I heard I listened to some other talks on the on the topic Is there some kind of like a cookbook for nlp? where you can research these these things because A lot of these except especially for the cleaning of the data a lot of these things are Experienced based right so it's you don't immediately find The different things that you need to do Unless you know one have experience with them Yeah, yeah Experience is the cookbook I can say Because the domain knowledge is very very important or we need to interact with the domain expert to clean the data Right Okay, then final question is will you be sharing the slides? So this is actually something that I uh, I'm asking all of the speakers So there is a mechanism on our website where you can go to your talk page and you can upload slides So I would suggest that you you do that and then people can pick the slides from there I will post the link to the the talk Entry to the chart and then people can pick it up from there All right, so thank you very much. There are no more questions You may want to go over to the breakout room me And then answer additional questions people may have I will post the the questions that have come up to that To that room We've already answered all of those. So Right, thank you very much. Good job. Okay. Thank you. Thank you. Bye. Bye. Bye