 Okay, now in the second part, let's look at some actual vulnerabilities that we have in the web. I chose the OWASP top 10 list because it's a fairly widespread list of security issues that is being used. And also because OWASP, which is an open web application security project, is a non-profit organization. So it's kind of, it does not have a direct financial interest in selling their products or anything. It's a worldwide organization. I don't think it has a chapter in Iceland, so feel free to start one. I think that would be very good. But it has chapters in most places in the world and organizes, for example, courses, seminars, and so on. What they do is also they publish material on security. For example, the list that we will be looking at, they have conferences and they have some other things. The top 10 list we look at, it exists also for mobile applications. We look at the web list. They list the most critical web application security risks. So this is obtained from, for example, asking security firms, asking industry in general what kind of issues they have, how severe they are, and so on. And then they rank them. So they look at how easy is it to exploit a risk, how common are they, how easy is it to find them, and what's the impact. So that's a very common risk assessment. For example, the Icelandic Weather Service uses this for their weather warnings, so how likely is it to happen, how common, and what's the impact. That's basically what they look at. And then the good thing is they have this juice shop application, which we look at in lecture 19 in the live session. It's an application that is insecure, and you can play with it. You can basically exploit the different challenges. And they're really from a very basic level to extremely, extremely difficult. So this will keep you busy for a while. The top 10 list from 2017, that's the most updated one they have is as follows. So these are kind of the top 10 attacks. We will be looking at some of them, not all of them. Some of them I spend more time on, others I go through very quickly. And then additionally to that, I discuss one is called Cross-Side Request forgery or CSRF. That's another one that's quite common. So we'll just go through them. If you look into the literature resources for this lecture, you'll find the general information on the project, and you'll find the PDF file that actually summarizes all of this. So it's 25 pages. You have all the different vulnerabilities. And in each of them, you have sort of how easy is it to exploit, how common are they, what's the impact. And then there's a description of what it is actually about, how to prevent it, examples, references. So it's really a good resource to look into to get an understanding of what this is about. And by the way, this source is the second or the third now, I think, the third mandatory reading for this course. Now let's look at them. The first one you might all have heard about depends is a so-called injection attack. So one thing that you hear a lot is SQL injection. And if you have Anna Sigur's database course, you might also cover that injection is essentially the case where you send a string, you send some kind of data. But instead of a string, you include some kind of command in some language. For example, if you do SQL injection, it means instead of a string, for example, instead of your username, you send an SQL statement. And then if the server is vulnerable, it means the server is just interpreting it. Instead of just having a string in there, the server starts interpreting your SQL code and is basically tricked into executing it. An example here is, let's see, the following code. I have a JavaScript statement that is a query. So I want to select star from accounts where the customer ID is what the user sent in, and the password is the password that comes from the user. So this this way I could construct an SQL statement that gives me from the database all the accounts. And then I can check if there is at least one account with this username and that password, then I log in the user. So far so good. Now, that's what we want to do. If the query returns at least one user, lock the user in. Great. Now the question is, instead of the username, what happens if I send in the following string, I do quotation mark or true a colon minus, minus question mark? What happens? What does happen is this cologne, the single cologne is the escape character. So if I don't have a if I don't if I just use string coordination here, what happens is I get all the user accounts where the customer ID is empty or true and or true. And then I stopped the I stopped the SQL query. And the minus minus is an SQL comment. So let's just do this just so that you see what happens. Let's assume I copy this. And now I just insert whatever user sent me. So I do select where customer ID is. And now instead of these quotation marks here, I send I put in what the customer sent me, the user, the customer sent a quotation mark or true semi colon minus minus. And then there's something in here, the password doesn't matter. Let's say the password is ASD. If you look at this closely, you'll see that what happens is I select all the accounts where the customer ID is empty or true and or true is always true. So this condition is always true. And then the rest of the password and so on the rest of the query the rest of the query is actually commented. So I have sent comment characters. And now what happens is I actually get all the accounts from the database. So however, whatever is in there is being returned. And now this means the query will always return one user. So I'm being logged in. That's actually what happens. And most likely I'll be logged in as the first user in the list because that depends on how I write my code. But probably the query just checks which user is there. So that's the so called injection attack. Instead of my actual username, I send in something that is a command. That's a problem. How do you get rid of this? There are a lot of APIs, a lot of programming libraries that automatically escape that automatically make a string secure. So they check, for example, is there a single quotation mark? If there is, then put some kind of escape some kind of special character before it. So that's something that's very common. If you use any kind of libraries in SQL, for example, most libraries support something that's called a prepared statement. So it's automatically escaped and makes sure that the code is not interpreted. So that's injection attacks. They are extremely common. They don't only exist for SQL. They're for other libraries. There is something called JavaScript injection. So the same thing you can do there. Lots of options. Very common, can be very severe. Next one, broken authentication. So this relates to the previous lecture. Anything that should handle authentication session management is implemented within an incorrect way. So that allows attackers to get, for example, password, session keys, tokens. Basically, it allows the attacker to pretend that they are someone else and they are able to use your application. For example, if you somehow get the password an attacker might guess your password because you use it all over the place or because it's weak. Session IDs are not really invalidated. So they're just valid all the time and I can just use them. So there's a lot of stuff that is wrong. An important thing here is this means the authentication is actually implemented. We are using some kind of authentication. It just does not work as it should. There are some ways of avoiding that. For example, as we discussed last time, delegate it to Google or someone that knows how to implement this. Don't try to do it yourself if you don't have the expertise. Another thing that might be good is to use multi-factor authorizations. So for example, a lot of you know two-factor authentication that you use in banks. You log in and then you get an SMS to your phone that says, here is your confirmation code or something similar. So then it's much harder to actually intercept, for example, a token. That's broken authentication. The next one is called sensitive data exposure. So this happens when sensitive data like passwords, credit card information, health information is not protected. So attackers can actually steal it, for example, credit card or modify it. So they can use it for pretending that they are, for example, you. This might be happening at two cases you usually talk about. One is at rest. So someone, for example, has access to your database. And in the database, nothing is encrypted. So you can just read the password. You can just read the health record. You can just read the credit card information. The other option is in transit. So when you send a request, you are using clear text, HTTP, and someone can just read your request, can read the password. So these are kind of typical things or you're using an algorithm that's simply not strong enough. MD Find is an example that has been around. We might get into that. How to avoid it? Well, you should use HTTPS. We have discussed that often enough now. The other thing is use a strong encryption algorithm that is known or is currently assumed to be safe and encrypt the data when you send it, but also in your database. So if someone ever gets access to your database, they can not directly read the passwords, the credit card information or so. As I discussed, an example is MD5 hashing. So in the past, people followed this. They said, well, we should never store a password, for example, we should only store a hash. So a function that was very, very popular is called MD5. And nowadays still a lot of passwords are stored as MD5 hashes in databases. But MD5 is known to be not super secure. So this is just something I googled in 2016. This has most likely changed. So 2016 someone asked on Quora, how long would it take to brute force to basically generate to find a problem in a 10 digit password using an MD5 hash, using a consumer market computer? So nothing super fancy. 10 digit passwords, I think most of you use less. So how long would that take? And someone calculated this, say, okay, if you take eight machines using some kind of core, you can do this in less than 47 days, in the worst case. Nowadays, this is only eight machines. If someone would try hard, they would get it much quicker. If you use a smaller password, if you use a smaller password, they will get it much quicker and so on. So this is not a very secure way. There are other algorithms for this. So in short, make sure you use encrypted communication and make sure that your data is actually also encrypted, both when you send it and when you store it. Okay, we jump over the fourth, we go to the fifth one, broken access control. This is different to the broken authentication. Here we look at what can people do once they are authenticated. And that's for example, saying every user can do the same thing. So we check if someone is logged in, but we do not change, we do not check whether that person is actually the right person. So let's for instance, assume I'm in Gmail, I log into my account, and then I just change the user ID and I suddenly have access to someone else's emails. That would be broken access control. They are checking whether I'm authenticated, but they're not checking whether I have the right authorization to access someone else's emails. Typical cases are security by obscurity. So if you enter the right URL, then you have access. People just assume that you don't know the URL. The other one is what I just mentioned. Let's assume I just change my user ID and suddenly I have access to someone else's emails. How to avoid this? Deny by default. So check whether someone has the rights, if not always just deny access. Rate limit is another one. Make sure that people are only allowed to send a certain number of requests per time, then deny access. That's made to ensure that people are not able to try out whatever they want until they hit something. Of course, whenever you have endpoints, get read, write, make sure that you actually have access control. You check whether the person is authenticated and you check whether the person has access, has the rights. And finally, you should limit course if you can, and that's something we have not yet talked about much. So I'll talk about that now. Course is called, is the acronym for cross origin resource sharing. And that means something that happens that comes from a different origin from a different host is by default not allowed access. So this is a security concept built into browsers, browsers restrict requests that go to another domain. So for example, if I try to do an Ajax post request from my computer to a different computer, the browser might say no, this is not allowed. You're trying to access something else. So I, for example, have a website on abc.com and I have an Ajax request to google.com. The default is that Google says no. This comes from a different origin. It comes from abc.com, not from google.com. So we block it. You can enable this on the server. So the server can say, I want to allow these things. And the way to do that is to set specific headers when you respond to a request. So, and that's in particular the access control allow origin header. And you can allow everything, but you can also allow a specific domain. So Google, for example, if they know that I'm going to write an application that is hosted on abc.com, that uses their service, they could say, yes, we allow requests coming from abc.com. That's for instance, when you use Google Maps, you have to go into the Google console. You have to say, I have a new application that uses Google Maps. You might have to pay for it. But you also have to say, which host name? Where are you hosting this? Because then Google actually allows you to share resources, to use cross origin resource sharing. So the idea is one way to avoid broken access control is to limit this, to really only allow domains that you know to use your backend. One other example of you can set specific headers, but in Node.js, for example, in assignment three, that's what the course module does. So course actually allows you to do this. Okay, so that was number five, broken access control. The next one is security misconfiguration. That's sort of on the application side. And it's simply that your database, your server application, whatever it is your operating system is configured in a way that is insecure. It's very common. As you see, this is the most commonly seen issue. And that's basically because there are too many options for configuration. There are people don't know the details. For example, I use MongoDB as a database, but I don't know how to configure it. So it might just end up being badly configured. And that's for example, enabling features you don't need. There might be a debug mode that allows you to have more rights or get more information that you should not see. There might be very common in databases, there might be standard accounts and passwords like admin and admin. And if you don't delete them, then someone can just try to access that. The other thing I have mentioned today in lecture 17, if you do a lot of logging and you actually sent the full stack trace back to the user. So there has been an error and you sent the whole error log. You say, this is the error in line 15. This has happened. This can be a problem because the user might be able to understand security problems based on that. For example, if you return database queries, it's often easy to see whether you can run injection attacks. So you should only send information back to the user that the user should see. There are a couple of things like that. Update software regularly on time. It has a lot to do with applications being outdated. They have security problems that are known, but no one updates them. Sandboxing, make sure your application is running in an environment where the access is restricted and also if something breaks, people don't automatically get access to other parts of your system. And then this is on the process level that's beyond this course. You should have a proper process of doing quality assurance, testing, checking your environment and so on. So security misconfiguration. The next one is again a coding problem and that's cross-site scripting XSS. Again, this is a common one you might have heard before. And this happens when a web page, a front-end page now is including data without escaping or validation. And that's similar to an injection attack. Basically my website includes data that is then executed as code and shown to someone else. For example, imagine you have something like a guestbook. You can leave a comment to the website. And of course you expect again, we can go to one of the big news websites. I don't know whether whether BBC allows me to comment, but many websites have a comment section where I can leave some kind of comments. This one doesn't. How bad. The interesting thing with comments is that they are not only displayed to me, but they also display to all the other users afterwards. So I'm entering some kind of information on a website that is then displayed to other people. Now, assume that instead of a comment, instead of a string, I put in JavaScript code. So in the text field, I, for example, put an alert in. And then if the website does not check that, the next person would see my comment, but instead of the comment, the JavaScript code would be executed. And then we have a problem. So similar to the injection attack, you should use libraries frameworks that escape this by default, that it doesn't happen, because otherwise we have a problem. This is something that can be used to do a cross-site request forgery, which we'll get to later on. I'll also show this in the in-class session, both in the juice shop, but also in a very basic JavaScript example, so just so you see how this works. And this, and together with CSRF, this is why even if you run a very unimportant website, it can have very severe consequences on other things the user is doing, we'll get there. Using components with known vulnerabilities is another issue, is a big issue. So components, any kind of software you're using, libraries, frameworks, modules, running with the same privileges as the application. So if your application, for instance, is running on a server, it quite often has a lot of rights to access files, for example. Now, if you have a component that is insecure, you have a problem, because suddenly that component can be attacked, so your application can be attacked. And this is a very common syndrome we see with, for example, I know I have old applications, I'll just update them later, I'll do my system update later. Node.js has this problem, because we very quickly do npm install. But npm install installs all the dependencies. So you install a library, it uses another library and somewhere down there, there might be something that is vulnerable. And then you have an issue. And of course, when you update later, you might not check often enough. For example, npm includes a vulnerability check. It's only so and so good, but it can help you if you don't do this often enough, you might not be aware of it. So the important thing here is, well, regularly update things, regularly check for problems, remove dependencies when you don't need them, and only use official sources. So don't download stuff from kind of dodgy places, make sure you have the right source. And last but not least, insufficient logging and monitoring. This is not in itself a vulnerability that can be exposed. So you cannot directly attack someone who is not logging enough. But the problem is, if you don't log on monitor your system, you might not see problems coming before they hit you. So for example, someone might attack your system successfully, and you might not be aware of it. And then the attacker can exploit this for a long, long time. For example, if someone is unsuccessful in logging in, you might see patterns coming that people are trying to log in, there are lots of requests from the same IP address, and then you can prevent. But also, if you just log unsuccessful logging attempt, you might not see when this happened, who it was, and from where the attack came. So you are not getting enough information. And then finally, last problem, if you log, well, either in the console, or you just log into a text file, then you have the issue if an attacker is successful and gets access to your server, what an attacker, a good attacker does first is delete the logs. So they basically erase the traces. So you should make sure that your logs are not just on one machine. And this is an interesting one. I've copied this from the top 10 list. Most studies that look at a kind of successful attacks show that it takes companies over 200 days to detect that something is wrong. And quite often, this is detected externally, so someone says, okay, we're getting strange requests from your website, maybe something is wrong. So if you have proper logging, proper monitoring, you might have been able to detect that much, much quicker. And you can imagine if it takes 200 days to detect a problem, a lot of things can happen on the way. Passwords can be reused credit card numbers can be misused and so on. How to avoid this log a lot on the server. So don't send all the logs to the user, but log on the server and also make sure your logs are not just on one server, but they kind of consumed across different servers. You have, for example, two, three central logging servers where you send information so that if one server is compromised, you don't lose your log data. That's the top 10 list. So that's the kind of important things we look at. All of these require kind of some detailed look, but they are important. And a lot of them, for example, this one here insufficient logging or also using components with known vulnerabilities, they are really basic and they sound easy. But as you have seen a lot, these are the top 10 lists. So these things happen all the time. So it's really, really important if you are ever in the position that you run your application or you are a network administrator that you are aware of these things. Now, as I said, there is one additional thing that I want to do and that's cross site request forgery to basically take this apart. What it means is we forge, we fake a request across a site. So we are basically on one website and we pretend to send a request that comes or goes somewhere else. For example, let's assume that you visit the website evil.com. It's very evil. And this website then sends a post request using Ajax, for example, Axios to your bank. It says, well, I'll try to send my bank.com transfer $100 to my account. So this is a cross site request evil.com sends a request to my bank.com. It pretends that it's coming from you from the user. So it's a fake request. Now the question is, what happens? The bank says, no, you are sending me a request. This is not working. I because I require authentication, I require you to send a token. For example, look at the previous lecture, we might be using OAuth together with JSON web tokens. So there has to be a token. It's not there. The request fails. You cannot transfer money. Good. The problem is, let's say you have another tab open. So for instance, we all have or a lot of us have often a lot of stuff open. Some people love to have 100 tabs open at the same time. Let's assume in one of those tabs, you are logged into your bank. And then you reload evil.com. What happens then is, the request that evil.com sends comes from your computer. And because we are using tokens that are stored in a cookie, what happens is your browser automatically includes the cookies that belong to my bank.com and sends it. And these cookies include the token. So that happens sort of automatically. So basically, if you don't remember why this is the case, you should go back to lecture two and lecture 18 looking at cookies. What happens is because we're logged in in my bank.com in another tab in another window, the request is actually successful. So the amount of $100 is being sent to our bank account. And this is a tricky thing. Now, quite often, you cannot do bank transfers like that. It might not be that easy. But another thing you can, for example, do is try to delete all the Gmail emails. Many people are logged into the Gmail in a different tab. So this might work. Try to kind of send requests that are very common, that people applications that people have open all the time. Let's assume someone has Facebook open, let's try to do a post, for example. So those things happen. And that's what's called CSRF. Now, the big question is here. Why would you go to evil.com? Because obviously it's evil. Why would you go there? One reason is something called phishing. You all get fake emails saying, hey, we are from Gmail, we are from Facebook, please reset your password. And sometimes they're really ridiculous emails, but sometimes they actually look really good. And especially if you're not a technical person, it's not a very low risk that you end up on one of those websites sooner or later. Of course, they're not called evil.com, but they might be called, I don't know, something Facebook admin.com or so. So you might end up on those websites, and then they do something. Cross site scripting is another one. So even if you are on the proper website, if someone manages to inject JavaScript code, they can just try to send a request. And that's then something that can happen, even if you're on a website that looks good. So that's a risk. How do you avoid this? From a implementation point of view, if you implement your front end, what you can often do is keep your tokens, parameters out of the cookies. Because cookies, again, they have these problems that they get sent automatically. As soon as the domain is the right thing, the cookies are sent. So if you have a cookie for domain facebook.com, and you send a request to facebook.com, all the cookies are sent automatically. So if you do not use cookies, that's a good thing. So one way of doing it is put all the sensitive stuff into some HTML feel, not into a cookie. The other thing you can do is use stuff like captchas. So you actually have this, I am not a robot, you have to make sure it's not sent automatically. Use multi-factor authentication, for example, if it's a real bank, hopefully they have some kind of way of using SMS confirmation or having some kind of external device, that way the request will not be successful. So these kind of things you should be doing. Okay, so these were kind of common vulnerabilities. I'll, as I said in class, I'll show off how these things work, or how they look like when they work rather. So that's a bit more the practical side to all the theoretical stuff we have discussed now. Now all of this how to avoid were very high level, very abstract. So the question is, how do you do this in practice? So let's say you have a Node.js application, you're using Express, how do you make this as secure as possible? And the reason I don't discuss this a lot is that usually this advice is technology specific. So you might get very specific recommendations if you're using Node, if you're using Express, if you use a certain library and so on. So the essentials are if you're using a specific programming language, if you're using a framework, look into their recommendations for security. And you might want to search for are there any best practices, for example, for Node.js or Express, just to find that. As I said before, stay up to date because these things can very rapidly change. So if a vulnerability in Express is discovered tomorrow, this might change a lot and you need to stay up to date on these things. But anyway, I'll just, I opened just one of these, one of the two references I have in the literature, again, is from July 18. So maybe it's outdated. But that's one of those checklists you can look at what kind of advice do they give you. For example, there are linter rules, next lecture, but kind of static analysis of your code that tell you about security issues. There are, this is something we have discussed in broken access control. You might want to able concurrent requests. So if someone sends more than 10 requests per minute, you should deny it. And there are libraries for that. For example, there's something called the express rate limit that says that you can set very fine-grained how many requests per IP address are allowed per minute, for example. And so on. So these are now advices. The general advice here was limit concurrent requests. And then if you look at these guides, you get specific advice. How do you do this for ExpressJS? So essentially, that's what you need to do. You need to look at specific technology and specific advice for that technology. Maybe join a user group or something like that. That's it. The main message is vulnerabilities, especially in the web, are extremely common and very basic often. So it's really something that should not happen, but it sadly happens a lot. That's why we hear so much about exposed data, leaked usernames and so on. There are checklists like the OWASP top 10 that help you. There is the juke's shop that helps you getting educated. And then finally, there are these kind of technology specific guides for programming languages, frameworks and so on. That's why you need to get started, but you need to stay up to date. But you will get very far by just doing basic things. That's it for lecture 19. In the next lecture, we'll dive into testing and debugging now mainly on the server side. So we'll look at how do we debug and unit test node applications? How do we test our endpoints in, for example, an express application? How do we do code coverage analysis? How do we mocks? That's a thing I haven't covered yet. And then we look at linting as well, static analysis of our JavaScript code. Thank you for today and see you in the next lecture.