 Welcome to the second part of Lecture 18. Now we dive into authentication authorization and there are many different ways to do that, so we just look at some common ones, discuss them a bit, but of course this is a topic that if you really want to build this into a productive website, productive backend, you need to spend quite some time and effort to get this right, since the whole purpose is to keep people from doing what they're not supposed to do. Now we start off with HTTP basic, that's a basic user and password authentication. Whenever you send a request to the server, you send your user and your password via the HTTP headers, so there is a header, a standard HTTP header that is called authorization and you just add the information there. So let's assume we make a request to our server and the regular request will not be successful because the server expects that we use basic authentication, so what you do is you add the authorization header saying basic, so that's the basic method and then there is a string here and that's our user and password. Now this looks pretty encrypted but it's important to make clear that it's not, this is actually just an encoding, so this is just a way to get the user and password down to one string and you can inverse this, so just because this looks very cryptic, doesn't make it safe or secure anyhow, so if I just take this text string, I find a base64 decoder and I say please decode this string, then you'll see it says grisha, cologne, my super secret password, so it's just a decoding, there's nothing secret here. So that's how you use it and on the server side you would check does this header exist, if it exists, is it the right authentication method? If it is, then use base64 to get the actual text string and check whether the user and password are correct, so that's really all you do and then if that's the case you send a proper response, otherwise you say no, probably 401 or 403, forbidden, not authenticated, you are not allowed to use this. All of it is of course sent in clear text, so there's nothing secret here. One issue here is that in every single request you have to send your user and your password and the chance is just very high that over time someone might intercept one of your requests, might read one of your requests and then they have your user credentials, so they can always use it in the future unless you change your password and this is of course a problem because passwords are very often reused and can also be guessed, so you can just try maybe different applications, does the same password work if I try to log into Gmail or anything else, so that's a problem. So what is used instead often is to say the first time you send the request you have to send your user and password, the server checks it and then the server sends you back some token, some kind of field, some kind of string that you use from then on, so you don't have to send your user and your password, so those are token based authentication methods and we'll introduce them later on. Now I'll just quickly give you an example how this works implemented on the server side, so if we go into Express and this is something that I'll upload, this is the material upload, there is a plugin that's called Express Basic Auth that just builds basic authentication into an express server and it's fairly easy to use that, so you just import the module as I did here and then you just tell Express to use Basic Auth as a middleware, so this is just a regular middleware that checks every request that comes in and it expects a list of user credentials. Here it's just hard coded, so that's not very, first of all it's not flexible, I cannot add or remove users and the other problem is that of course I don't want to have this in my code so in a real system this should come from a database somehow. That's nothing we go into here but you should of course not just hard code the users here, but this is just to demonstrate that this works and after this statement every endpoint I implement will require an authentication, so here I have one endpoint slash I get, this one will only work if I send the credentials, before that I have another endpoint slash front end, if I access this one then there is no authentication required. So let's try that, my native terminal to start with and I have it in my slides, so now this is started, if I go to localhost 3000 slash front end I get this horrible looking application, that works if I go to slash I will not get anything and if you look at the details what I get is a 401, so the server says sorry you are not authorized you're unauthorized you cannot access this, if we want to change that we have to send the user and password and we can do that via postman just to show you in detail how this looks like, so let's say we create a new request we get HTTP localhost 3000 slash, if I send this I'll again get unauthorized now if I add the right header then it will work and postman allows me a shortcut here I can actually go into authorization and then choose basic auth so that's exactly what I want to have and we want admin and super secret I think that's yes that's exactly what I had so if I do this now then I'll get 200 okay welcome so the server has actually checked this and just to make sure I can put another password in and you'll see that I get 401 unauthorized so it actually does check what's in here it does not only check whether or not the header is there so that's basic authentication as discussed it's not ideal because you can steal passwords the chance is much higher so instead you typically have token IDs and one way of implementing them is session IDs the way this works is that first you authenticate yourself so the first request is for example using HTTP basic authentication and then the server sends back a response and says yes you are authenticated you are logged in but additionally the server sets a cookie so it says okay in your client now please set the cookie SID is five and this is now the token this is the ID you will use from now on so the next time the user sends a request there should be an arrow here the next time the user does not send any password a user name the user just sends this cookie back and this is something that happens automatic because of cookies so the cookie is being sent and then the server checks does session number five exist if it does you can also load all the data all the things you have you know about the user or the user has been doing or so so this is one way of doing it five is it's not a very good session ID usually this is some kind of cryptic string so that you cannot guess it because as we discussed in lecture two cookies you can change so we can just change it to six for example and so the server should make sure this is a number that is not guessable just like that this is a stateful implementation because if we go back here you'll see that there is a difference whether I have set the cookie or not so it actually depends on previous requests so we have built a stateful implementation this is a conflict with the rest principle so this will not be restful that's a disadvantage as we discussed the problem with stateful implementations is that debugging for example testing is much much harder the other thing is of course just like the user and password if someone reads your request they can steal your session ID and then they can use it so they can use requests with your session ID and they will be successful that's a problem but usually there are other ways to get around that the other thing is of course if you have if you have very easy session IDs like one two three four five you can guess them very easily and then that's also a security risk the general case that you often see of a session ID is a so-called token so basically we're talking about data about a string that you use instead of the user password credentials and that's really to avoid that credentials are being sent back and forth the most common kind of token that you see in users is the bearer token that just means that anyone who has the token anyone who carries it has access to the API one example of that is PayPal they use bearer tokens most applications that use authentication where you log in Facebook Gmail all the big ones they use bearer tokens so this is by far the most general standard and the way it works is again as before we have some kind of authentication we log in with basic authentication or here PayPal uses OAuth2 we discuss that later but in some way where you send here's my user here's my password please log me in and then you get a token back so you get some kind of string back again this should not be a string that is easy to guess you get this back and then in all your future requests for example when you want to make a payment you say authorization instead of basic this time you say bearer and then you just put this token in there so you have a string that you use to log in we can look at that again in postman because I had this I showed PayPal at some point we'll just see whether I still find it PayPal token this is actually the request where I get my token so somewhere in the body here is my authorization information and I will not show that to you but when I send I say okay I'm this user and then I get something back so I'll get something back here's my access token that's the string that I can use so as you see this is very cryptic this is hard to guess that's essentially my my token that I now use to say I am allowed to do something in case someone tries to copy this I will of course make sure that the token is not valid anymore when you are watching this video so don't try it's not working and now the next time when I for example say get payments I actually send this token to the server so this is just a placeholder in postman but basically I say bearer and I put the text string in there that I want to use so that's what I'll do and then the server will say yes or no it worked so most likely this is because I have used the token if I remove this then I think it says unauthorized exactly so here's my authorization information that I'm using now and now if someone listens to my request if someone copies this they get the token but they don't know my password so they only have the token and that's a very good thing because there are a couple of sort of safety measures built into the tokens the first one is tokens typically expire so they have a certain validity for example a couple of hours and then they are deleted on the server so if someone gets my request they can copy the token but they can only use it for a couple of hours and then it's gone and then they cannot get a new token unless they listen to my connection again the other thing is tokens usually have a scope so on your API you can implement this in a way that one token is only valid for example for payments and nothing else and then if someone steals it that's bad but they can still only access the payment they might not be able to do other things so it's not like they directly get full access to your account so that's some kind of safety built into the token system so these are tokens that there are just in case you're interested there are lots of different token implementations so for example there's one that is fairly popular that is called a json web token jwt in case you have seen that so it's just different ways of generating of representing these kind of text strings okay we still have this already the first authentication so in the beginning you have to send the user in a password and we discussed that the HTTP basic authentication is not that safe because it's clear text and it's easy to actually get this wrong in your server implementation so there is something called OAuth 2 which is sort of the industry standard of doing this and the principle is you delegate the authentication to someone that has sort of has the resources has to know how to implement this in a good way and that's exactly what you have on all the websites that say hey you can sign into this with google or you can sign in with github or anything else this is basically OAuth 2 so you say someone else should handle the authentication they then tell you yes the user is the person he or she is saying and then they can use your application there are a lot of different access scenarios in OAuth 2 so if you're interested there is a whole lot to read up I'll just show you at the most I'll just show you the most basic one and the most common one so there are others the way this works is you have an application so this is your application you are writing in this case we use gitlab so assume we are writing gitlab we are the gitlab developers there is some kind of API like a restful API that the user wants to get access to and you have to handle authentication so make sure that only the right people get access to the right resources and then we have an authorization server so we use some kind of industry standard some company that has implemented this and this works then as follows I say I want to go to gitlab.com I want to use the API and I say please log me in with google gitlab then redirects me to a special google url where I have to enter my user and password you all know this right if I click sign in with google I come to google gmail looking page and I have to enter my gmail account and password if this is successful then the authorization server will connect to gitlab so it will actually send the token we have discussed to gitlab before and then if this is successful your original your original request will get the response with the token so we'll say here is your token this has worked so this basically means google is handling the authentication gitlab only gets a token it does not get any of our information gitlab never gets to see our user or our password so they don't need to worry about it they don't need to check in their database whether the password is correct that's all handled by google and then we get a token back and then we can actually make a request to the gitlab api with that token and gitlab then only checks whether the token we are sending is the same that they got from google originally that's all they do uh so that's oauth it's a fairly nice thing and it's very standard as you know most websites have something like this nowadays if you're interested there are lots of libraries for example for uh for note so if you just search for oauth note then you'll find all different here's for example an overview on the original oauth page what kind of libraries are there for client and server to implement this uh just so you're aware of this it's not always very easy to implement that so it might even with a library it might take you some time to get this running okay um so this is what we do this is i would say the industry standard uh you have you start off with an oauth to authorization you get a token uh and then you use this bearer token to authorize yourself in the api so that's how many applications nowadays work uh the all the big websites with login are operating based on this there are however still some issues we have um and one of them is all kinds of credentials can be stolen everything whether it's user password whether it's a token a session id or oauth to access token which is just another form of a token so there's nothing different here all of this can somehow be stolen intercepted and then it can be used so let's say i'm making a request to PayPal i want to do a payment and i say here is my token what could happen is that someone for example in the same network manages to to read my request manages to copy my token uh and now what happens is that that attacker can of course send his or her own requests and just adds uh the stolen token and this will be okay so PayPal will say yeah great thanks Grisha you just transferred half a million to some dodgy account so that's a problem um that will always be a problem because you somehow have to send your information across the internet so one thing again is definitely use HTTPS so that this is not clear text but even so things can be stolen and that's a problem there is one other way of making this a bit safer which is called request signing which we look into now request signing has has specific use cases so it's usually not used for kind of front and client side things it's often used when one server one backend talks to another one and we'll get into why that is the case but what happens is that i have a request i want to send for example i want to do a put request to some url uh with content type jason doesn't matter any kind of request and the server the receiver wants to make sure that this request is really coming from me um and i write amazon here because amazon is one of the big users of this technology um what i have is a so-called secret it's called a secret and not a password because the difference to a password is that the server amazon also knows my secret uh if we talk about usernames and passwords usually always say the server should never save the password in clear text so gmail for example should not know my password they just store it in the in an encrypted version here for the secret it's different amazon and me we both know that my secret is my secret so this is the text string and that's important because amazon needs to use that password as well uh what happens is i take my request i take this string put slash quote slash nelson blah blah blah and i create a hash of it then i send my request i send the put request to the server and i include the hash that i have created um so i send the request and then amazon the server creates the same hash again using my secret and checks whether they are identical uh if they are it means the request i've sent is actually the same that i have hashed here and that means i'm the real sender that might be a bit cryptic so we'll go into details um let's assume we have uh the following request i want to do a post request to slash users content type jason and this is the body uh i want to create a new user called ellis she's 33 and her password is secret uh and now i have this string that is called my secret again in practice this is a very long text stream that i get um and what i do now is i use a hash function to do request signing uh and in this particular case this is just one uh technology called h e m a c uh but what you can do there are generators online there are libraries to do that but you put some kind of string in there uh for example this one and you put a secret key in here my secret and then you choose a algorithm for example chart 256 is one that is supposed to be safe and then i say compute uh and then i get a string what has happened uh what a hash function does it it takes an input here the input is somehow this text somehow combined with the secret key uh and it creates a string that is of a specific length so it doesn't matter how much text i put in here uh now i've just copied and pasted this a lot i will always get a string that has the same length so you see that the length has or it has not changed uh and the important thing is if i put in the same string i will get the same output so if i compute this a hundred times i will always get the same output um this means if i take this string and i put my secret here and i press compute i get a certain string and if amazon does the same thing on their server they will also get the same string if i put anything else in there if i for example say delete then you'll see that this string differs now let's see 0 6 3 7 4 5 uh you'll see that it's somehow different there is a different different value came out uh that's essentially what our hash function is doing so the hash function makes sure the input string is mapped to some kind of fixed length string the same input is always mapped to the same string and different inputs are mapped to different strings that's what we call a perfect hash function if this is really the case so what happens is i take my post request as i've just shown you i take this text string i use my secret i use this h m a c algorithm and i create a hash then i send this request to amazon and i put my hash string into the header i say authorization h m a c here is your string i send it to amazon amazon gets this request so they say okay this is what we get they know my secret so what they do now is they take this request they they remove the authorization so they they take exactly the same values that i took uh put it into the h m a c generator and they get a text string out and what they now do is they compare what i have sent them to what they just generate it um if they are the same they accept the request if they are different they decline it uh and this means that you can only send exactly the same request otherwise the hash function will produce a different result and it will say no so if for example i'm an evil attacker now and i send post users but i put my own body in there i changed the the username i changed the age i changed the password uh but since i don't know the secret code i cannot regenerate this hash so i just sent the same value if now amazon takes this generates the hash function it will get a different text string and then if they compare the two they will say okay they are not the same so we actually decline the request um the impact of this is if you listen to my request if anyone intercepts this request um what they can do is they can do a so-called replay attack they can tell they can send exactly the same request again and it will work but if they anyhow change the request if they change the method if they change the url if they change the body then the request will be declined because the authorization is wrong so you have made sure that by signing you have created uh basically a unique request that cannot be changed so what you can do is you can send the same request again that's called a replay attack you basically replay the request but you can never change it so the user the the attacker can only do exactly the same thing that you did the attacker cannot do anything else um and that's really a good thing so basically it can only be sent again it cannot change one thing we can do to it to make this even harder is if we in our hashing function also include the current date then the server can also make sure that we can only replay this within a certain time so we can for example say each request is only allowed to uh be accepted within five minutes uh and then the attacker has to send the same request within five minutes uh otherwise the server will say no uh the disadvantage every kind of method has a disadvantage a disadvantage we have here is that both parties have to have the same secret the same text string and that somehow needs to be exchanged and now this exchange is again a potential attack point because someone can uh intercept this for example if amazon says here is your secret text string someone might be able to read that and then it's the same as a password they can just use it uh what what companies like amazon try to do to avoid this is that they change it over another channel so instead of using uh the same HTTP request response they send you the secret in an email for example uh or on a different website or so and then it's much harder for the for the attacker to both check your email and listen to your HTTP connection or so um the other problem which is not in here is that uh you have to enter this secret somewhere because you get it for example in an email uh and that's why we often don't use this in front end applications because it's a very long text string it's not like a password that you can remember so it's often used when you for example from one back end you use an API in another back end uh then we use these technologies they're not very often used for kind of end user things uh as already discussed amazon aws uses this technique to make sure that uh you are allowed to use uh their API again there are some libraries for this uh not much you probably have to implement this yourself so there are libraries for hmac generation so the how to compute this value uh but the check on the server side whether this is the correct stuff is something you need to implement there are some basic applications on on github so this is uh the conclusion of lecture 18 there are lots of attack surfaces we can we can use an attacker can use to mess with us they can look at the network they can try to listen to our request they can exploit our applications if we have bugs if we have vulnerabilities and finally they can try to get through the user to somehow for example get their password once again a reminder that htdp is completely unencrypted so you should not use this uh use always htds and finally do not assume that just because you use htds you are safe so there are always issues there uh governments spying on you bugs in ssl for example so there are lots of options uh so on top of that you should try to use an authentication method that is somehow secure uh i've shown you some of them all have their tradeoffs so htdp basic is for example is very easy is very quick but you send your password in every request so that's maybe not the best thing uh one thing that you should be doing is rely on well tested libraries and authentication delegations so for example use oauth to zero if you can use libraries that have a lot of users that have for example either a lot of stars on github or lots of downloads on nodejs so that you can make sure that these things are tested uh it's not something that some dude in his basement wrote on a weekend so make sure these things are properly tested and used and that's it for lecture 18 in the next lecture we actually then go into the problems we might be having security vulnerabilities attacks some remarks in detail on nodejs but we'll look at kind of the top vulnerabilities we see in applications so thank you for listening and see you in lecture 19