 Let us start with the topic today, which is malware. So this is just an introduction to malware. We won't get into any great detail into any particular kind of malware, but we've all been had some experience or the other, we've heard something about these different things. So the first thing is worms, viruses, spyware, botnets, trojans, etc., etc. So what's the main difference between a worm and a virus? Worms and viruses both refer to malware that replicate themselves. This is the most important issue. Replication, just like a typical worm or a typical virus or bacterium replicates in the human body. So the same thing about the cyber malware, they replicate and sometimes at a very fast rate. So a virus latches itself onto an executable file. It cannot exist without latching itself just like a human virus that infects like the flu virus, for example. It latches itself onto an executable file or program, while a worm is typically a standalone program. A virus infects a file and uses it as a host from which to infect other files, while a worm spreads typically from one computer to another computer. A Trojan horse, another kind of malware, is a program with a malicious component masquerading as a useful piece of software. So you are induced into, for example, downloading it onto your system, into your cell phone, etc. But unlike viruses or worms, Trojans, by definition, do not replicate. If it replicates, then it would be a worm. A Trojan is typically activated by action on the part of the victim. Trojans may enter a system in several ways through email attachments, through file sharing software. It could come from a website or through cell phone downloads. So there are many vectors of propagation as far as Trojans are concerned. Going down specifically to worms, and later on we'll talk about whatnets to. Some of the different classes of worms are internet scanning worms. And examples of these are the Code Red worm and the Slamma worm. Email worms, Melissa and Sobegg. P2P worms that spread in P2P networks. Very interesting worms like Sammy and Santy. And as we examine some of the case studies, the first question to ask is, what is the vulnerability? Whenever you see any attack, you must go back and say, what is the thing that caused this attack? And then mobile worms like Kabir and Comwar. Some of the relevant questions to be asked in the context of malware, what vulnerabilities does a worm exploit? How does it select its targets? How is it carried? How does it infect other hosts? How is it activated? Is it human activation or automatically activated? How fast does it spread? What harm does it cause? Are there any preventive, detective measures against such attacks? And if so, how effective are these measures? So let's try to answer at least some of these questions in the context of one or two worms. The other development that took place about 15 years ago was the unleashing of polymorphic and metamorphic worms. So you may be aware that the earliest antivirus products worked by looking at payloads of typical network packets and examining a sequence of bits, which was the signature for a particular worm. So the worm writer got very smart and decided to dodge the antivirus software and came up with something called polymorphic and metamorphic worms. So most worms and viruses have unique and distinct signatures, typically to start with a pattern of bits, usually assembly language code. Not everything is assembly. It could be also JavaScript. It could be PHP, et cetera. So a pattern of bits, usually assembly language code, which appears in all instances of the worm. So that was the earliest kinds of worms. Worm and virus signatures are the key to detecting them. However, there are sophisticated code obfuscation techniques to evade detection. So they tried to dodge detection by doing something very interesting, which is one such technique is the use of encryption for disguising worm code. Different instances of the worm may use different keys for encryption. Thus they might fail to match any existing worm signature. Such worms are said to be polymorphic. So you have polymorphic worms, you have metamorphic worms. If it's just a bunch of assembly language code, you very well know that there are so many different ways in which you can write the same program. So if you're just looking at machine code in the worm payload, you might actually fail because you can write the same program in about 10 different ways. So would you have then 10 different signatures for a given worm? And of course newer and newer mutants are coming up all the time. So it's very difficult to keep up with the signatures for worms. It's very difficult to update the database that often, the database of worm signatures. The first case study I would like to talk about is that of code red. So this happened about 12 years ago. Code red was an internet scanning worm carried in an HTTP request messages, targeted at a web server. It's the Microsoft IIS server. Several million of them were an active deployment at that time. So the first thing the worm writer does is he asks himself the question. If I'm trying to target something is that machine very, very pervasive. Do I have lots of Microsoft IIS servers in deployment around the world, for example? Second thing, is there a vulnerability? And the answer is yes. A buffer overflow vulnerability, we talked about it yesterday and we're gonna have a demo of this today. A buffer overflow vulnerability was discovered in the Microsoft IIS web server and a patch for this vulnerability was developed a few days later on June the 18th. I believe the year was 2001. The first worm, the first version of the worm was unleashed. This used a random number generator to generate new addresses of machines to infect. So the question is how do you select your targets? Which IP addresses? So what does it do? It uses a random number generator to generate new IP addresses of machines to infect. Now how does it know that that machine runs the IIS web server? So it's got to be lucky. So he has a worm and he's sending his payload, the infection to other worms around, I mean to other machines around. Now he's just hoping that this machine with some arbitrary IP address that he's just generated also has the IIS web server on it. Not only that, it should not have been patched. Correct, otherwise it will not be infected. So it generates this random number generator to generate new addresses of machines to infect. However, the same seed was used for the random number generator in every instance of the worm resulting in the same machines being infected again and again. So you infect one machine, the worm code over there looks and scans around to see other machines to infect and infects other things. Again, the probability of infecting another machine is not very high because as I said, it should be a web server, IIS web server, and it should be unpatched. However, it used the same seed. So the same machines were being infected again and again and this was taking place on July the 12th. So roughly a month after the machines were patched and the machines were patched approximately a week before this vulnerability was found and this information disseminated to everybody. Now the worm writer was quite smart. He noticed that too many machines are not being infected. So a second variant of this worm was unleashed shortly after on July the 19th. So about a week later, wherein a random seed was generated in each worm instance. This had a dramatic effect on worm propagation. About 360,000 machines were infected in just a span of 14 hours after the launch of this new version. That is on July the 19th. So we're talking about one week later than the first version. The infection phase continued until July the 20th. So all of them were infected until July the 20th at which point the worms moved to attack phase. So they are sitting down there for a day and then they were instructed on July the 20th. At this point, until July the 28th, they launched a denial of service attack on the White House webpage, www.whitehouse.gov, and defaced the webpage with a phrase hacked by Chinese. So this is actually what they did. It was basically somebody out there, we don't know who's the worm writer, but somebody who's trying to show that he can do a lot of harm. He can deface webpages and he can compromise lots of machines. So hundreds of thousands of these machines were compromised and he deliberately chose this thing, this IS web server, because he knew there were many, many deployments, hundreds of thousands of these around the world. Guess what? Another worm writer comes up three years later and comes up with this slammer worm. The main differences are both exploit the buffer overflow vulnerability. However, the target in the first case was a web server. The target in the second case was a Microsoft SQL server 2000. So most of the deployments around the world are Microsoft servers. So that's a very, very low-hanging target for most worm writers. Interestingly enough, in the first case, the protocol was TCP. So it was actually HTTP on TCP. And in the second case, it were UDP messages to this SQL server. The payload now, so you can see the improvements. First you're using UDP, which everybody knows is a connectionless protocol. So obviously the latency is going to be less. Not only that, the payload in the first case was 4 kilobytes, 4,000 bytes, and was reduced drastically to 384 bytes in the second case, in the case of SQL slammer. The characteristic of the first one was it was latency limited. So what limits the spread of this worm was the fact that was latency or time or delay. On the other hand, the second one was limited by bandwidth. So if you had more bandwidth, you could spread faster, but there are bandwidth limitations. And then the infection rate doubled every 37 minutes. So the number of infected machines doubled every 37 minutes, while in the case of slammer, it doubled every 8.5 seconds. So in a matter of less than an hour versus 24 hours, huge number of machines were actually infected. It's very interesting to see what is the effect of this thing and to somehow model it. So this is somewhat of a research problem. What are the number of infected machines? So these are some of the parameters. Let N be the size of the total population. So this is what's called the Simple Epidemic Model. This model is used to simulate or to model a spread of human diseases like say for example, smallpox. Let N be the total size of the population. This is the population of IAS web servers, for example. Let IT be the number of infected individuals at time T. The number of susceptible machines at time T is then N minus IT. So N is the total number of machines that is supporting the IAS web server. And I is the number of machines at time T that's already infected. So the number of machines that remain to be infected that are still susceptible is N minus IT. Let beta be the infection rate that is each infected person or machine attempts to pass on the infection to beta susceptible machines in one time unit. So now the question is, can we write a differential equation for this to model the spread of this worm? And let's see how fast it propagates. So the following differential equation captures the number of infected machines at time T. So the incremental number of new infected machines in that little interval of time DT, let's call it DI, then DI is equal to beta times IT multiplied by the rest of the thing times DT. So that is to say, the reason for this is very straightforward. You must have seen this differential equation in different contexts. There are I-infected guys. Each of these I-infected guys tries to infect beta times DT guys in an interval DT, correct? Because the infection rate is beta. These are the infected guys at time T. Each of these guys will infect, in an interval of time DT, each of these guys will infect beta times DT other guys. But will all of them get infected? Well, some of them are already infected. So it will not increase the infected pool. It will only increase the number of infected machines if the machine is not already infected. And the probability of not already being infected is N minus IT divided by N. So this is the probability of not being infected. Only if it's not infected, can it add to the number of infected machines. So this is a differential equation. And the solution of this, if you integrate, you will get this. Any guesses as to what this thing looks like? So many labs around the world actually started to model these things and took measurements and so on and so forth when they saw such a huge amount of traffic directed on port 80 and trying to target these IS servers. So a lot of labs around the world started to save this traffic for later analysis. And indeed they came up with a similar kind of equation. So you might have seen again this thing. The graph for this looks like over about a 24 hour period. You can see that the number of infected machines is increasing dramatically. This is what's called the logistic curve or the S curve. You might have seen this in neural networks and other things, the logistic curve. So it's increasing dramatically out here. And the model says it will go on increasing and sort of saturate. But really speaking, what happened was the number of infected machines started to taper off and in fact decrease a little bit earlier. Now what is the reason for this actually? So this model is not perfect. It started to, it increased and then it started to decrease after a while. And the reason for that is in reality, many of the machines got to be patched. So these are all the, at this point in time, the susceptibles are only the machines that were unpatched. So again, look at the time frame over here. Sometime in June, this vulnerability is detected. Which vulnerability? The buffer overflow vulnerability. One week later, Microsoft announces the patch. Now some guys are not very cautious. Some of the system administrators, they didn't patch the machine. A huge number of machines around the world were not patched. People ignore these directives. They were not patched. So a lot of machines are potentially infectable. So all of those machines contributed over here. So they all sort of started getting infected once the worm was unleashed over a 24 hour period. And then what happened is, system administrators were warned that this worm is moving at a fast pace. So they started patching their machines. So that value N, which is the total number of susceptibles, that value itself decreased. We had this N, which is the total number of size of the susceptible population, that itself began to decrease. And as a result, since some machines got patched, they got rebooted and so on and so forth, they were not infected in the end. But still the number of infected machines was very large, as you can see. So this is a nice model that has been built by some research groups around the world. Another kind of worm besides the internet scanning worm is a so-called topological worm. And this includes things like email worms and P2P worms. So topological worms are so-called because the vulnerable machines can be represented as a graph, with the nodes representing the vulnerable machines. An edge between machine A and machine B suggests that A knows or stores the address of B and is capable of directly infecting B by sending it a malicious payload. So one of the questions I presented earlier was, how do you know which machine to infect? Which is a valid machine to infect? So over here you have a topology, actually. Every node in this graph, A is supposed to know or store the address of B. So for example, if you have a phone book and you have a worm that is spreading through the cell phone network, then one ready way of doing that is I just look at the phone book and my cell phone and I know whom to infect by just looking at those addresses. So topological worms have focused targets. Their immediate targets are their neighbors who in turn spread the infection to their neighbors and so on and so forth. So if I'm over here, then my immediate targets in this graph would be all the addresses that are in my phone book. So if I have a mobile worm, for example, it would first spread through Bluetooth or MMS or whatever is the propagation vector, it would spread through all the people in my phone book. And then it would look at the phone book of those victims and would spread through all those people and so on and so forth. So that's why they're called topological worms. The best examples of topological worms are email worms and P2P worms. Another example of a worm is something that you have seen a little bit about yesterday, web worms, very interesting. Sammy and Santy are two examples. Sammy is the one that actually exploited in XSS vulnerability. So the XSS worm, Sammy. So this is a very interesting thing. This guy was trying to design something and he never expected it to be so successful. His name is Sammy Kamkar. The XSS worm, Sammy, was unleashed in October of 2005 by Sammy Kamkar. It infected the social networking site MySpace. Social networking sites typically allow users to create, edit, and save their profiles, making those profiles accessible to some members, to their friends for example, who are some members of the social networking group. So you can read my profile if you are my friend on some social networking site. Sammy added a bunch of carefully crafted, so this is what he did now. He started thinking, thinking, thinking. He's well aware of XSS. So he added a bunch of carefully crafted JavaScript to his profile. So that's the first thing. Now very clever JavaScript. If you Google around, you will find that JavaScript and you can try and understand it. When a visitor to Sammy's website, say V1, downloaded Sammy's profile onto his browser, the JavaScript and Sammy's profile began to execute. This caused Sammy to be added as a friend in V1's profile and also to include the message. But most of all, Sammy is my hero. So this is the JavaScript that got infected. So when I looked at Sammy's profile and downloaded it, say I'm Sammy's friend, then what will happen is the JavaScript in Sammy's profile will execute. Okay, because that's in a webpage, correct? And anything that's JavaScript will execute by my browser. So that thing began, this is a persistent XSS attack. So that thing began to execute and what does that thing do? The first thing it does is it uploads the infection to my profile. So I am V1, V stands for victim. I'm the first victim who's reading Sammy's profile and my profile gets infected and Sammy's added as my friend. And the next thing is, somebody else starts looking at my profile and then his profile gets infected and so on and so forth and this worms started spreading very rapidly. So some of the details are in the textbook and the actual details of the code you'll find on if you Google around. Within 20 hours of the first visit to Sammy's profile, Sammy had been added as a friend to more than a million user profiles. This rate of spread was even faster than that of code read. Now, how did the worms spread? So one thing is we understand how XSS, the vulnerability, but the difficult part is, how did this thing actually spread? That's the whole thing about a worm. How does it replicate? So how did the worms spread? Any MySpace member can update his profile after logging in. After V1 logged in and viewed Sammy's profile, the malicious JavaScript embedded in it began to execute and as part of the execution, the JavaScript uploaded itself onto V1's profile on the MySpace server, thereby infecting V1's profile. Okay, so so much about, so these are just brief introductions to different kinds of worms. We have looked at now internet scanning worms. We've said something about XSS worms and topological worms. Another very interesting thing, which could be pretty devastating in the future, because so many cell phones exist today, more cell phones than laptops. So this is a potential problem for the future and that is mobile malware. So once again, what is the vulnerability? So mobile malware exploit a number of vulnerabilities. Some of these are features of the Bluetooth protocol. If your cell phone is Bluetooth enabled, others are software vulnerabilities that exist in the implementation of the Bluetooth protocol stack. Most of the vulnerabilities are of the social engineering type. You carelessly accept some download from some place which is not really secure. On many cell phones, users can download new applications. So in the old Symbian operating system, for example, these were packaged in SIS files, Symbian installation source files on the Symbian OS. Today, one of the most popular OSes is Android and still you have lots of infections as everybody knows. So many of our students over here are designing worms for the Android system and even botnets based on the Android. So some of the earliest mobile malware were packaged in well-formed SIS files. The installer was tricked into believing that this was an update of an existing application. So this is a very nice thing, just update this application because it's got some new features. So you download that application and lo and behold, the installer replaced the existing application with malicious code. So whenever the application was invoked, it was the malware that also ran. So I downloaded this application and I was not careful to see where it came from. That's the vulnerability. So another common vulnerability as far as mobile malware is concerned is placing the cell phone in discoverable mode. Now, when you do this, you enable the attacker to obtain the Bluetooth device address of the victim's cell phone. So this BD address is basically a MAC address which everybody knows is 48 bits. So knowing these BD address, that is Bluetooth device address, the attacker could attempt to exchange files. So this is part of the protocol. Once you know the BD address, you can exchange files using something called the object exchange protocol, OBEX protocol. This is used to transfer images, business cards and other files between two Bluetooth enabled devices. So here again, you've got a vulnerability and besides exchanging images and business cards, you might also exchange malware. Another vulnerability, so there's a whole list of vulnerabilities as far as mobile malware is concerned. Another one is related to social engineering. User authorization is usually required before a file can be accepted by a smartphone. The smartphone usually prompts a user to enter his or her pin as a way to confirm whether an external file, for example, should be accepted. Now wait, some operating systems accept file transfers without user authorization. So they bypass this to make it convenient. Some smartphones allow users to disable the authorization required because it's a hassle to always say yes and no and to type your password and so on. So some smartphones try to be nice to the users by allowing users to disable the authorization required option for file transfers. Also it has been estimated, so these are some reliable estimates, that between seven and 25% of the users indiscriminately accept files or MMS attachments because they bear some very flashy kind of header. You acclaim your price for one crore rupees or something of that sort. Some examples of this, one of the earliest example was the Kabir worm, which attempts to discover other Bluetooth enabled phones which are set in discoverable mode. When it finds such a phone, it sends the worm payload in an SIS file. So this is again related to the earlier Symbian operating system. The receiver needs to accept and install the file. Kabir's payload was mostly benign, so it was not really, it didn't really cause harm, it was just a proof of concept kind of worm. Kabir's payload was mostly benign, typically displaying Karebe, just some name on the screen. However, the continuous scanning for new victims, don't forget one of the main jobs of a worm is to spread. So it's continuously scanning for new victims on an infected phone, and this depletes the power of the infected phone. Next was another worm that came which had two vectors of propagation. One was Bluetooth, and the other was multimedia messaging service. It used MMS to spread to different contacts in the smartphone's address book. So the earlier question, how does it know who its victims are? Just look at the phone book and you have a whole list of people out there you can infect. It required user interaction to be installed. It entices the user with catchy subject headers such as Happy Birthday. Once it infects a smartphone, it attempts to discover Bluetooth-enabled smartphones and pass on the infection as an SIS file to them. So notice the two vectors of propagation, MMS, where you discover your victims using the phone book or address book, and the other is Bluetooth, where you use Bluetooth in discover mode. If it's in discover mode, then you can discover it and infect it. Okay, so from that, we move to now one of the most common types of malware, something that's getting increasingly problematic today, and malware design and so on has moved now just not only for having fun creating a new malware, let's be a hacker just for fun, but also for financial gain. So that's now with Botnets. A Botnet is an army of compromised computers or bots connected by the internet and remotely controlled by a Botmaster. So the new thing about bots is that they're under control of a Botmaster, unlike a Worma virus, which has no further communications with whoever unleashed it. While in this case, the bot is continuously or at least periodically in communication with its Botmaster. The earliest Botnets were a collection of zombies that participated in distributed denial of service attacks. Today's bots, however, are much more sophisticated. They may comprise tens of thousands or even millions of bots. So very serious problem because you've got this whole army there just waiting to do what you wanted to do. You're all waiting to accept your command. The emergence of Botnets, as I mentioned before, is closely related to the motive of financial gain. So I boast that I've got these one million bots under my control, these one million infected machines under my control, and I can command them to do whatever I want at any point in time that I want. Then I go to the market and I say, do you want to buy 100,000 of my bots? And I will allow you to command them to do whatever you want. So so much, so many $100,000 you pay me and I will give you say 100,000 of my bots from my one million bots. Often used to send spam mail on behalf of third parties. Bot programs may contain keyloggers and other forms of spyware that capture sensitive personal information such as passwords and credit card numbers and send these to the bot master. So now that becomes a very dangerous thing because many people would like to have a database of credit card numbers to do business with or even to commit fraud. Botnets have been used as an extortion tool. Pay up or your website will be bombarded by a DDoS attack. So I've got all these guys just waiting to attack you. If you don't pay me, I will command them to attack you and your web server will be down. One important difference between a bot and a computer infected by a traditional worm virus Trojan is that a bot needs to communicate with specific nodes in the botnet. So this is another thing. You communicate to receive fresh commands. So every say week, you might receive a new command. Early botnets used an IRC internet relay chat IRC server as a command and control server. A channel on such a server was used to convey the bot master's commands. Now the problem with this is if I've got an IRC server, law enforcement might come and notice it and might arrest me because I'm the one. I'm the bot master and I'm deploying this IRC server to issue my commands and so on. I can be detected. So what do you think they did? The second generation of botnet used a P2P network. A more recent trend has been distributed and decentralized botnet architectures. So nobody really can figure out who I am, who's the brain behind this botnet. Distributed and decentralized botnet architectures which leverage existing highly scalable and robust P2P networks. P2P networks are sort of in a sense anonymous. So you don't know who's the bot master and who are the guys who are disseminating his commands and so on. The connectivity of P2P networks ensures that even if a large number of bots are disabled, so it's very fault tolerant, even if a large number of bots are disabled, the rest of the bots continue to do their master's bidding and stay connected. Moreover, there are no fixed CNC servers making it hard to detect and incapacitate a P2P based network. If there is a single bot master sitting down somewhere, you can almost be sure that within a few hours or days he will be detected. So the best thing is to create a P2P network and disseminate commands through some of your Chamchas or cronies. So let's see the picture over here. This is the bot master and there are all these nodes in the P2P network and I choose some of these cronies and the thing on top that you can see. So this is the bot master. These are all the nodes in the P2P network. Now there are two types of nodes, one with those crosses and one without. The one with the crosses are the infected guys and the ones without are the uninfected. And the bot master actually uses a set of bots to disseminate its commands. So these commands actually move back and forth. They actually spread through this P2P network just like you would exchange music files and so on and so forth through a P2P network. Now these guys are responsible. So he dynamically changes these guys. So these are his cronies at one point in time. These cronies might become another set of these infected guys at another point in time and they disseminate commands to these guys, the infected guys. And typically these infected guys are asked to go to a particular website and retrieve the commands from that website. So on Wednesday, for example, they might go to this website but I don't want to be detected so I dynamically change the URL. On Thursday I might say go to this website and I'm continuously evading law enforcement by changing the website. So if you think it's this website, tomorrow it's this, then I've changed it again to this and so on and so forth. And finally one last case study on the Storm Botnet. So this was first detected in January of 2007. It's other names are PCOM, Nuvor and Zelatin. So Storm Bots are infected in stages. The most common vectors for propagating the primary infection are email or infected websites. So you get an email, the typical thing. You click on a link or you go to an infected website, you click on a link out there and lo and behold you've got the infection. Email was sent with sensational subject lines so you're induced to read this thing because it sounds so sensational. 230 people die as Storm Batters Europe. Likewise, users were lured into downloading free but infected files from websites. Download this file, it's going to give you some best music of the last one year, for example. So you're lured into downloading that thing. Containing music of various pop artists. The primary infection instructed the victim to join the Storm Botnet, which is embedded on top of a P2P network. That P2P network's name was the Overnet P2P network. Now this network itself is based on a well-known cadmilia protocol which employs a distributed hash table based routing protocol which efficiently locates a value corresponding to a given search key. So you're searching for a given song, for example. Now suppose a peer X needs to access a file. It would perform a search based on a search key which could be the hash of a file or the hash of the file name. The result of the search is the IP address and port number of the peer hosting that file, say Y. So once I know who has that song, I X would go directly to Y to obtain a copy of that song. So this is the same way I'm going to leverage this to obtain a copy of the malware. So both node IDs and search key are drawn from the same space of 128-bit numbers. So once part of the botnet, so this is the interesting part, it's not just a static thing, it's very dynamic. Once you're part of it, the bot was programmed to receive the second and subsequent injections of malicious code. One of the injections instructed the bot to propagate email viruses. Another injection received some days later, instructed the bot to launch a DDoS attack on a target specified by the bot master. So you can ask it to do anything that you want and you have got so many of these things under your control. Now what I would like to do is to continue the discussion because this is one of the most important vulnerabilities, the buffer overflow and the XSS. So right now we will continue the buffer overflow vulnerability because I didn't manage to do it completely. You can actually see how this is done, how this is defended against, and you can try it out both in your college with your students and teach the other teachers also about buffer overflow.