 Perfect. So you're going to talk about security on Python with us today, isn't it? Yeah. Wonderful. So, well, whenever you're ready, the floor is yours. Hello, everyone. My name is Gajendra Deshpande and I'm working as a student professor in KLS Cocteau Institute of Technology, India. Today I'll be delivering a talk on deceptive security using Python. So these are the contents which we are going to discuss today briefly. So introduction to deception, then two tools, web trap and demon hunter deception tools. Then what our experiment, how we develop the deception technique, then conclusion and finally the references. Imagine you are passing through an unknown state at midnight and you find that some antisocial elements are following you to save yourself from them. You start running and look for a safe place to hide on the way you will find some good person and you request him to help you. So he hides you in his place to protect you. When these antisocial elements visit a good person's place and inspire about you, the good person misguides them and directs them to some other place in order to protect you. This is exactly how deception works. In this analogy, you are the resource is to be protected antisocial elements are the hackers who want to gain access to the resources. And a good person is a reception technique that protects the resources from hackers by making them fall in the trap. Now let's understand the basic idea behind deception how it works. So the definition of reception is that it's a technique where hackers methods will be used as a security mechanism that is fishing the fishers. Now let's assume that you have a legitimate website of our bank and what hackers do is they create a similar user interface which looks exactly the same, but the backend is different. So, so when you enter your details, assuming that it's a legal website, but in the backend hackers are collecting your data to carry out the further attacks. Now deception is the military tactic used by both attackers as well as defenders. So in our case we are using to protect our resources. Now this diagram shows how deception works. Now there are two users. One is the benign user and second one is the malicious user. Now both have access to the common user interface. Now depending on the type of user depending on the input depending on the activity, either they will be provided with the real system or the deceptive system. So benign user, if he's authorized, if he's authenticated correctly, then he will be given access to the real system. Otherwise, the malicious user will be redirected to the deceptive system, which looks exactly the same, but it's not a real system. Now there are two types of deception technology. One is active deception and second one is passive deception. In active deception, what happens is inaccurate information will be provided to the hackers intentionally to fall further trap. In passive detection, incomplete information will be provided. So intruders will try to gain the other part of the information and fall further trap. Now they can also be classified as a client side deception and server side deception. So mostly client side deception is used by hackers to deceive the legitimate users. Whereas the server side deception, it is used by the security providers to deceive the hackers. Now you can develop a better deception by combining both the approaches that is active deception and passive deception. That is you can come up with a better deception which has incomplete as well as inaccurate information. Now let's see the deception's evolution and its advantages. Now honey pots were introduced in the year 1998. So honey pots are the small traps in the network when trying to access those points, they will fall further trap. Then honey net is nothing but the network of honey pots they were introduced in the year 2000. Then honey token is a small piece of information which is embedded in the real information. When somebody steals the real information, this token will give the alert to the system administrator saying that so and so message has been stolen. And it gives the information about how it has been stolen. The next honey pot 2.0 were introduced in the year 2012. The deception technology came into existence in the year 2016. Now the advantages of deception are increased accuracy, minimal investment and future ready. So it is applicable to even new technology and even the existing technology. Let's first discuss the web trap deception tool. It is designed to create deceptive web pages to deceive and redirect attackers away from the real websites. So the deceptive web pages are generated by cloning the real websites specifically their login pages. So this project has two files. Basically one is web cloner and second one is web server. Now what web cloner does is it clones the real websites and creates the deceptive web pages. And what deceptive web server does is it is responsible for serving the cloned web pages. Not that it is serving cloned web pages, not the real web pages. And reporting to the syslog server upon request. So if anybody tries to access the cloned web pages, then it will be logged in the syslog server. Now you can install this web trap tool by following these commands. But the problem with web trap tool is presently it works only on Ubuntu 18. Now you can see here the usage is shown in this slide. So you need to make use of webcloner.py file. Then you need to specify the next parameter as the output directory. So in the website URL you want to clone. So example is shown here. So here we are cloning the Wikipedia's login page into the directory Wikipedia login page. The next is the web trap. That is the deceptive web server. So to use it you need to specify trap server.py. So to trap server.py file you need to specify the directory name and the syslog server. So here trap server.py file is serving the login page from the Wikipedia login page folder, which has the deceptive web pages. So when somebody tries to access this folder or the files within this folder, it will be logged in the syslog server. The next tool is demon hunter. So demon hunter is used to create low interaction honeypot servers. It has agents and a manager to check the logs. So it allows you to create your own honey net all customized by yourself from ports to the protocol handlers. That means you can have your own port numbers. You can have your own protocols. So usually different protocols are also allowed. So in this diagram you can see here, centrally there's a manager component which manages everything. So that means it manages the honeypot devices. It manages protocols and it manages port numbers. And notice these protocols can be of different nature. So they may not be just HTTP only. So it can be combination of HTTP, UDP, SMTP, FTP, etc. Now why we developed a deception tool is we know that cyberspace is a national asset and XML is the heart of many mainstream technologies nowadays including web services, services of architecture or microservices, cloud computing, etc. So web services vulnerabilities can be present in the offering system, network database, web server, application server and so on. Now when a new technology is introduced, it comes with its own new challenges plus old challenges will also be present. Say for example, when we say SQL injection, it is available or it is present with respect to relational databases. But when we use XML as an alternative to relational databases, so same kind of injection attacks can be performed on XML document also. So next, the problem which we tried to solve was to secure the web resources from XPath injection attack using modular recurrent neural networks. And for that we propose a solution that uses modular recurrent neural network architecture to identify and classify a typical behavior in user input. So once our typical user input is identified, the attacker is redirected to fake resources to protect the critical data. So in this case we developed our own validation technique, input validation technique that is count-based validation technique. So in next few slides, I will discuss how we developed count-based validation technique and how it works. Now we need to first understand how XPath injection attacks works. So in this slide you can see that there is a small XML piece of code which stores username and password, right? So at the bottom you can see that there are two lines. One is in blue color and second one is in red color. So the line which is mentioned in the blue color, it actually indicates the valid query where valid username and passwords are mentioned. Whereas if you consider the last line which is specified in the red color, that's a malicious query. So you can see here no real data is used there, instead attack vector is used. So you can find some Boolean operators and some unwanted characters. Now SAPPAC on XPath injection, it clearly states that to perform XPath injection attack, you really don't need any skills. So any beginner can perform these attacks. Any beginner can perform or create an attack vector to perform the attacks. And typical likelihood of exploit is very high. So that's why it is a very important thing to handle. Now we studied a few research papers and we found that there are some gaps related to the existing work. Now what we found was neural network approach to identify and classify a typical behavior in input was not yet done. So the study showed different approaches to handle XPath injection attacks. It also showed methods applied and their disadvantages. We can conclude from the study that neural networks are not applied to detect XPath injection attacks and existing results are not promising. The study showed how modularity in case of neural networks helps to achieve improved performance. Modular neural networks have not been applied to cyber security, particularly to the detection of SQL or XPath injection attacks. Now this slide shows the system design. So it shows the working of the entire system. So you can see there are three tiers. One is presentation tier, business tier and the data tier. Presentation tier has the login form. That's nothing but the user interface through which the user or the attacker interacts. Then we next have the business tier where the data processing or the application logic is stored. Then next we have the data tier where we are storing real XML document and the fake XML document and also the customer messages. Now if you consider some examples, some examples of valid inputs are email ID, mobile number, etc. Then examples for malicious inputs are also mentioned. Then there is a third category that is some invalid inputs. So that is very large input strings, language, special characters, etc. So what happens here is when attacker uses invalid inputs, it's going to generate an error message and error message also gives you some information related to the system. For example, which browser the client is using, which operating system the client is using, etc. So to avoid that you can design custom error messages and write the system information. Now in this algorithm we are describing how our count-based-valuation technique works. Now the first app is to scan the user input, then next determine the length of the user input, then count the frequency of every character. So you need to count the frequency of characters, digits and special characters. Now you can see here in table 4 on the bottom right corner we have specified the character and the threshold. So that means only up to the threshold, the characters are allowed. If it is exceeding the threshold, then appropriate error codes have been assigned. So if the frequency of the character is below threshold, then the value set for that particular character in table 4, then set the error code to 40. Else if the frequency of characters mentioned as the special characters is above the threshold, value then set the particular character threshold to 40. Now what we have done is we have modernized the neural network. We have not used the single neural network because that will result into a lot of samples. So to reduce the samples we have divided into three neural networks. So in the first neural network we are training it on login attempts. In second neural network we are training it on error codes and error codes. So now to build a neural network we have used recurrent neural network and with 15 neurons and hidden layer as LSTM network, that is long short term network and output layer as softmax. Then we had used resilient propagation trainer to train the network in the training data set and tested asset is created in real time to validate against the trained data set. So in the second neural network we have used the trained data set is created in real time to validate against the trained data set. Then if the training error is, if the training error of both the networks are 0.0%, then classify the input as in table three. So table three is mentioned in the next slide. If it is classified as valid, then you need to display the message as login successful and data to real system. If it is classified as malicious, then the content from fake XML file has to be displayed. Otherwise invalid then customer message has to be displayed. Now you can see here we are training our third neural network based on this data set. So we have output of neural network one, we have output of neural network two, then the final classification. So if one of the output is malicious, then it will be classified as malicious. If one of them is invalid, then it will be classified as invalid. But the flow is from valid to invalid to malicious. Now we had used PyBrain library for neural network. Then for web services, we have used Bottlepy Microwave framework. For web server, we have used Bottlepy and Apache. Then similarly for drawing graphs, we had used Python and Py and matplotlib. Then PyBrain is a modular machine learning library for Python. It is short for Python based reinforcement learning, artificial intelligence and neural network library. So to download, you can follow the URL and you can follow the instructions. And there's a very nice tutorial. You can follow it to install PyBrain and execute examples. And similarly Bottle is a fast, simple and lightweight Microwave framework for Python. So it is distributed as a single file. So it has no dependencies, not to install just include the file in your directory and start using it. So it has built-in routing, templates, utilities and server modules. So again the information can be, more information can be found in the specified URLs. So these are the results we have got. So in case of true positive, you can see here we're getting more stable results with respect to modular neural network. Whereas results are unstable with respect to single neural network. So similarly when we compare false negatives, it is the same. So results are better with respect to modular network and single neural network. So same is the case with true negatives and also the false negatives. Now when you compare the response time here, so you can see here when we use modular neural network, our neural network response time is less. So when we use single neural network, it is taking more time. So the ratio is 1 to 1.5. Now, when we summarize the results, we see that you can see here the results of modular neural networks are better compared to single neural network. So that is including an excluding outlier. The results of modular networks are better in all the cases. Now these are the screenshots. So you can see here in this slide, we have a fake data file and we have real data file. So if you observe the structure, both looks similar, but the thing is data in one file is fake. It's not real. So this is the user interface, which we have created for our experiment. So in the first case that is valid in the scenario, user will enter the right or legitimate user name and password. So it will be classified as valid. So it gives the message login successful. Then similarly, if user enters malicious query, then note here that it is not going to deny the access. Instead, it will display the fake data. And also note here that when it is displaying the fake data, we are also capturing the details like the server web browser, what query the attacker has used, what's the port number, et cetera, et cetera. And also we are also capturing the login attempts. Then next, what user will do is, or the attacker will do is he will try to login with the fake credentials and this time it shows that login successful. But note here that this is not the access to the real system. Instead, it is giving access to the deceptive system. So conclusion is that our solution offers input security over existing methods by misleading attackers to false resources and custom error pages. Our results also show that the system accepts legitimate input, although the user input may contain some special characters and rejects only truly malicious inputs. So our solution combats modular neural network and count-based validation approach to filter the malicious input. So it also resulted in increased average detection rate of two positives and two negatives and decreased average detection rate of false positives and false negatives. The secure systems have to be successful every time, but attacker has to be successful only once. So with the deception, I can only say that we can buy extra time to protect our resources, but you may not be able to protect the system entirely. So these are the references you can refer for more information. Thank you. Thank you.