 Everyone let's get started. We have a lot of stuff to cover. We only get through ends, right? We have to cover a lot of background materials so we have enough knowledge to understand the actual attacks that we're talking about, right? I mean, I can just teach the attacks, right? But without knowing x86 and knowing how the stack works, you can learn at a high level what a buffer overflow is but that doesn't give you enough knowledge to actually go execute it yourself. So that's what we're doing here in the web context too, right? We're going through and understanding how the web works so we can understand the security problems here. So, we looked at one of the three main technologies in the web. URLs. HTML. HTML is on the screen that really counts. HTTP. HTTP, yes. Right? URLs give us a way to make an HTTP request which gives us an HTML page which contains new URLs which we can use to make HTTP requests to get HTML, right? These are the core building blocks in the web. If you understand these, understand how they work, you've come a long way in your understanding of the web and web applications. And I guess I'll also emphasize, I didn't talk about it much, but do you do a lot of your web browsing on your computer? Probably. Probably on your laptop. What about your phone, right? So, do you actually actively browse the web and browser on your phone? What about the apps that you use? So, we've actually done some research and we've shown that a lot of apps actually are just a browser. They're embedded browser loading HTML code, either from a remote site or from somewhere on a local device. So, this is also why this stuff is increasingly relevant. It's not just that we're talking about browsers and websites, the web and web technologies have proliferated all types of technology. Anybody develop any of the Windows, I want to call them Metro apps, but I know that's like the super old term for them, like the full-screen style apps where they'll work on the Windows and on Windows phone. Who is that? Anyone recognize? It's not good news for Microsoft. Anyways, you can actually, so you can write that code in C-sharp, but actually the way they recommend you do that is using HTML and JavaScript, right, technologies that they took from the web to develop a completely native app that has the same functionality as the C-sharp code or even the C-code. Actually, it's really cool you can use all three. So, you can have JavaScript code, call C-sharp code, you can have those called C-code and it all works and it'll work across all your devices, which is really cool. So, the last technology that we need to talk about is the hyper-tax markup language. So, it's simple, I guess it's kind of in quotes here. A simple data format. It's simple in the sense that it's human readable. You can on any web page, you can right click and say view source and you'll be able to see the HTML of that page. Whether you agree that it's simple or not simple, I would agree that it's an arguable point. But the key point is that we can have HTML documents. We've defined this language such that these hyper-tax documents when they're interpreted by your phone or by your computer or by your laptop or by a giant supercomputer, right, they're still parts and represented the same. So, this is actually, so HTML is based on SGML which was a prior standard in 1986. HTML, so a little bit of history. HTML 2.0 was proposed in November 1995. HTML 3.2 was proposed as a recommendation in 1997. HTML 4.1 was proposed in 1999. Then they went down this weird track where they were like, well, we have XML and we have HTML. If you know a little bit more about the details about HTML, you know that it's not valid XML. So they were like, well, and this is actually what I'll say. Part of the reason why I love going over the history of this stuff is because I had no idea about all the differences between XHTML and HTML. I would just code websites and hope that they work. I think that's how a lot of people do this. So XHTML, they're like, let's merge this. You have to write an HTML and it has to be valid XML. And so this was in January of 2000. And then everybody hated it and nobody really adopted it. So then they were like, oh, God, okay. So let's go back to HTML. We'll change the HTML5 spec. And so this actually, so you can see the huge gap, 2000 to 2014. There's a 14-year gap between any new standards, instances of HTML. And then HTML5.1 is currently under development. And they're actually treating the HTML5 spec now as what they call a living specification. So it's continually being adapted and updated. So how do you actually make this that sort of, so it's maybe interesting to think about what all these specs come about, right? The Tim Berners lead to show up and be like, this is HTML5.0, everyone do this or else. So what is the HTML protocol really between or what does it describe? I mean, who cares about HTML and how to parse HTML about all these specs? Browsers, yeah, browser vendors care, right? They want to, well, maybe want or not want, right? They want to ensure how to interpret an HTML doc and display it, but who else cares? What was it? The client? So yeah, so the person using the browser cares, right? They want to access the same website, whether it's in Firefox or Chrome or on their mobile phone, right? It should still represent the same information. Who else cares? The web developers, right? They care about writing an HTML page that is also acceptable to all clients and will render the same and not render differently, right? So you have this kind of crazy situation where browsers could develop some new HTML tag or some new extension to HTML. Some developers can start using it and then eventually it works its way into a standard and then other browsers have to do that. So it's HTML really... So the very first version was kind of Tim Berners' lead saying, okay, this is how I think HTML should be done, but after he did that and other people create browsers and other people extend the language, it grows and grows over time organically. And so these standardization efforts are really just trying to capture... They're working groups that try to come to consensus, so they have representatives from browsers and web developers, all these kind of people who talk about what things people are trying if they want to try new things. So anyway, since it's an interesting look if you try to think about like, why is this crazy feature in here? We did that in 1996 and we're stuck with it, so good luck. Okay. So the basic idea of HTML is that you have raw text, which is a text file that's marked up with tags, which add additional meaning to that raw text, right? So this is kind of like a hypertext. So the basic form of tags, so has everybody seen HTML vaguely familiar with HTML? Like most of you, right? Okay, cool. So this should be reviewed, right? So tags start with the left angle bracket, the name of the tag, and then end with the right angle bracket, followed by some kind of text. So there could be text, there could be nothing in there, there could be other tags, and then finally we'll have the end tag for this, which is a left angle bracket, a slash, that same tag name and then a right angle bracket. You can have self-clothing, closing tags, which have a slash at the end, which means this is an empty, this is equivalent to bar, open tag, close bar tag, with nothing different. You can also HTML, so this is where HTML differs from XML, you can have tags that have no end tag, like an image tag. So an IMG tag specifies an image, and this has no end tag. So this is one of the things that growth that XML people crazy can fix with XHTML. So if you think about the tags in a document, they form a hierarchy, a hierarchical structure. We can have nested tags, we can have tags inside other tags, and inside there we can have other tags, we can have sibling tags that are at the same level. Look at an example, so in a typical HTML page you'll have HTML tags, and then you'll have some body with the body of the tag. So here we can see so if we're drawing this as a tree, the root of the tree would be HTML, we have two children tags, head and body, head would have a child tag or title, and title would have, we think of it as a child of that tag. So it'll look kind of something like this. So this is actually pretty powerful, if we're able to just specify this information in the browser, and the browser knows how to interpret this, this title is what I want to do, the title of the HTML page, this piece of paragraph, so maybe I want to indent that a little bit, and separate it from other paragraphs, and maybe I want to do other kinds of styling, or whatever. But just as is, we need a little bit more expressivity, maybe all body tags aren't the same, all P tags aren't the same. So this is where we have the idea of attributes live inside the start tag after the tag name separated by a space until the closing bracket of the tag. So there's four different types of syntax for this, so you've got foo, bar, so this just means bar is an attribute, I'm tagged foo, tagged foo, I like that. So foo is the tag name, and bar is the attribute. You've got foo, bar equals baz foo is the tag name, and the attribute bar has the value baz. And you can also single quote baz, which is exactly the same meaning, and you can double quote baz. So each of these means the same thing. So interpret whatever is between those quotes as the value of the attribute bar. You have multiple attributes, you separate them by spaces, so you can have foo, tagged foo has attribute bar with value baz, it's disabled, it's required to be true, so this shows all three syntax in one go. Questions on tags? Tags and attributes? Cool. So this actually allows us a lot of flexibility in how we can describe an HTML document. But the core of the HTML document, we'll put just a hypertext in HTML, is the link. We want a link to another document where people can get more information. And so the anchor tag, so this is why I've never wondered why you use the a tag in the HTML document specified link, thanks for anchor. It's used to create a hyperlink. And so the href attribute, which would be the hyperlink reference, href, is the URI that you want this person to go to. And then the text inside the anchor tag is the text of that hyperlink. You've ever seen, I mean, you've seen kind of the blue underline, right? Remember, you click a link, this is exactly what's happening. So you have a tag, that a tag, that anchor tag, it has an href attribute and the value inside that href, the value of that href attribute is the URI that will be interpreted when you click on the text inside this tab. So this renders like this so you could click on this. We kind of all, I think, when we instinctively see this we're like, oh, click, right? We've been trained so well to know that that is something that's clickable. So once I'm on a slide like this, I really want to wait. So a basic HTML5 page, so starting with the basic HTML5 standard, you specify the doc type. You have a special tag here that specifies the doc type is HTML. This tells the browser this is an HTML5 page, part of the tricky thing about the web is even though there are all these specifications HTML5, HTML4, a browser today if you want to run that browser can the W3C committee force everyone to upgrade to HTML5, all web developers and all the 8 billion, I don't know how many websites there are force everyone to upgrade all of their HTML content? No, so developers have to be crazy backwards compatible all the way to probably HTML2 or something insane like that. So there's all these tricks about the browser has to try to figure out what version of HTML are you using. So this is a way that they've all decided on to tell the browser this is an HTML5 page. Then we have to have everything included in HTML tags. We have to have a head tag. We have a meta tag that specifies the character set. So why is this important? So what does the browser get this HTML page as? Just bytes from the server, right? There's just bytes coming back. So how does the browser have to interpret that? Does it interpret as UTF-8? Are you using UTF-16 or I don't know some crazy other type of encoding, right? There's all these types of encoding. So it needs to know how to interpret the rest of that document. The title of the page I should change that. Okay. Then we have the body and in the body we just have some link to go to example.com. So let's pause for a moment. Why am I using example.com as a link here? There is nothing called example.com. There is nothing called example.com. Go and check. Does somebody want to go to example.com? You're all infected with malware. Does it? It does exist. Yeah. So there's an RFC that specifically says if you want to use something in an example, because if you just put a domain name like badguy.com that could be somebody's actual address. If it doesn't exist now it could exist later in the future. So this is actually a standard so whenever you have examples or papers you should use example.com or .org or whatever you want one of those domains. They're all along by the same. I think it's W3C or IETF or one of those organizations. How long does that domain take? What is it? I am. Okay. The DNS people, right? Internet. Right. Okay. So this just describes this language, right? What we've looked at describes the basics of HTML. The browser is what's actually responsible for parsing and interpreting that HTML and displaying it to you, the user. Right? So your user agent could be the browser that we're used to, Chrome Firefox. It could be the mobile Safari on your phone. It could be a crazy links or W3M, like a console-based browser if that's how you roll. But the point is that it's up to the client, the user agent to interpret that those HTML pages and display it somehow to the user. Right? So that page that we saw we can view it in Chrome. It looks exactly like this. We can see the title is on there with our old title which I will never be changed from your amount. And it has some text here. And we can view the same page in the links browser. So this is the links browser, it has the title and text here. So you can actually completely just browse the web and put it in the command line. That can be cool. It probably does not. Which may be a good thing depending on how you think about that or feel about that. Okay. So what was one problem we looked at with URLs and us specifying values in URLs and what makes a valid URL? The parsing. What about the parsing? Right, so distinguishing what was part of the mandatory part of the URLs and what was the data in there and with the special characters. So in URLs slashes, question marks, those are all special characters to the URL. So looking at this example, what looks like a special could be special characters here. The slash. What else? What? The double quotes? Where's the colon? I don't think there's a colon on this page. Oh, oh, oh. Yeah, that's good. That's a URL. So actually everything that applies to URLs is different, right? Yeah. The less than or greater than symbols, right? These signify the starting and ending of tags. Exactly. So, what if I wanted to write some, I wanted text to appear on my HTML document and I wanted it to look as text and I wanted it to be like how do I represent the text left bracket, foo, right bracket, right? Because if I use that as is, the browser will say, oh, that's the start of a new tag, right? I'm a browser, I know what a parse HTML. The spec says if there's a less than symbol then we're starting a new tag and anything until that right angle bracket is going to be a start of the tag, right? So, what's the solution to all of this? Encoding. You have to encode it or translate it somehow, right? Just like in URIs we did percent encoding, right? To percent encode the text. Here we're going to use that. Of course it's going to be something different, right? To different specs. So, we need to encode this. So, in HTML less than 5 it's called an entity reference or an entity encoding. It's also called a character reference in HTML5. So, there's three types of how to do this. Everything starts with the ampersand symbol and ends with a semicolon. Which is crazy. I don't know why this is probably a historical thing actually. That would be interesting to look at. So, there's three types. A named character reference where you say ampersand, some predefined name that the specification says exists and then a semicolon. So, I believe the less than symbol is ampersand LT semicolon. You can also use the numeric character reference so you do the ampersand, a hash sign and then the decimal of unicode code point. So, this is also the ASCII value. Right? So, if it's ASCII whatever you can put whatever that value is in here it will show up as text. You can also do it the same way with hexadecimal. So, this was you do ampersand and hash tag, the hash hash tag. The pound symbol is something like that confused. An X to say hexadecimal. So, if you did ampersand pound sign X 41 semicolon then it would show the character capital A because that's what capital A is ASCII value is. Okay. This is going to be the cause of a lot of problems. Lots of problems. So, this is something that you should understand. Right? And remember this is again communicating from the application developer to the browser. Hey, I want you to parse this thing as text. This is not special HTML characters. Right? I want you to parse this as text. Okay. Let's look at some examples. So, just as we saw in URIs right? How do we start in a URL with percent encoding? Wow. It's in the name, I just realized. How do we start a percent encoding with what type of character? Percent. Percent is because it's in the name. Percent encoding. Yes. Which means we have to encode the percent especially. Right? The exact same thing happens with ampersand. We have to, because the ampersand starts a character reference. So, we can't have an ampersand by itself in the language or in the HTML page. So, we have to use ampersand amp. So, this is the main reference of ampersand. Or we can use hash pound 38 or we can use pound x26. And we can even do pound x zero zero zero 26. It can be as long as you want. So, this is the silly E on my last name. The E with up. So, this is an acute E for the name reference. Otherwise, it's one of these for the other references. So, we kind of talked about it a little bit. So, why do we need to encode the less than symbol? Because of the relationship with the tag. We want to tell the browser here I want the symbol less than. Or this character left angle bracket. Not. I'm not trying to start a new tag here. So, you would do this as, I'll do this. Dollar time, LT. It would also be hash 60 pound x30 or zero zero zero 30. Okay. So, with what we've looked at so far, we've actually had all of the basics for the web, right? We have pages. Those pages can have links. We can click on those links to go to new pages. Those things will have links. We can click on those things to go to new pages. So, everything is awesome. Unfortunately, we need a little bit more expressivity. We need to away, well, as the web evolved, right? For just a text document knowledge representation system, links are fine. You don't really need any more than links. You just have some documents. That document has links to other documents. But this creates a very read-only style of the web. So, to actually provide more input to applications, HTML has a way to describe forms which I hope we're all very familiar with, right? So, it has text fields, buttons, checkboxes, range controls, color pickers, any kind of thing you want. But, its relationship to those three technologies we talked about is very much the same. It's a way to create a complicated HTTP request. So, all anchor tags, when you click on them, will make an HTTP GET request on a server parsing that URI. And that's it. It will only make a GET request on a URL. A URL you click on with an anchor tag. But forms allow us to actually use more expressive parts of the HTTP language. So, the action attribute is just like the href of a link. The action attribute on a form tag tells us where this request and the URI is for that request. Default, if it's not there, is the current URI. So, make a request of the current URI. But, with the method action attribute, now we can change the request and we can tell that, hey, we want to make a POST request, right? Or PUT request for a DELETE request. And so, yeah, actually I don't think you can do DELETE in these, can you? I don't know. We can look into that. I think it's a lot of script, so that's weird. Should be able to. So, usually, typically either GET or POST, the default, if you don't specify something, is going to be a GET. So, you have a form tag, a form tag, the action attribute of the form tag specifies exactly what's going to happen when you submit that form. The children input tags are either transformed, the values I can put into those input tags. Are they transformed into query parameters? If it's a GET request or the HTTP request body. Remember we saw, when looking at HTTP requests, that the body the client can actually send something in the body of that request. And so this is when this happens, is when we fill out a form and set it to be a POST. So, GET passes data, and the query POST passes it in the body. And the data is going to be encoded depending on how you set your encoding. Not terribly important. If you need to upload files, if you've ever wondered why you have to set certain things on files, when you're uploading files, it's being coded in here. So, the data, just like in URLs, is sent as named value pairs. So, data from the input tags, so if I'm an input tag with type text, the name is foo and the value is bar, it's going to look like this when the browser renders it. So the value inside is going to be the default value of this input control, and it's going to be a box that allows me to fill in input and has the value bar here. So, when I'll submit this form, it has to turn this into key value pairs. So, the key is the name here, which is going to be foo, and it's going to be the value bar. So, the value is either going to be the value attribute, or if the user type stuff in there, it's going to be the current value of whatever is inside that box. It's going to be the empty string if there's no value. So, then we have to encode all named value pairs in the form. Because we're using these forms as part of the URI in form URL encoding, we're making a GET request. We're going to use these named value pairs inside the URI. So, we need to make sure they're properly encoded. So, it's going to use percent encoding, except the incredibly interesting thing is that, of course, it's slightly different. It does percent encoding, except that spaces are translated to plus instead of percent 20. Say, why, Professor? I don't know. This is how it was done. I can probably always say, like, weird historical accident, and I'll be correct for, like, 90% of things. So, in our case, that'll be foo equals bar will be sent as the data there. And just like in URLs, multiple cases are separated with ampersand. So, it's going to be foo equals bar and, I don't know, other key equals other value. So, if we look at this, we have a form that's action as example.com slash grade slash submit. We have an input type that has the student, the name of student value of bar, an input type of class and the grade, and then finally submit button. So, this type of submit tells the browser, this is the thing I want you to click on to submit. So, this is going to look like this. It's going to be a very simple form. This first input field is going to have a value bar, right? And so, if I fill this in, something like with Adam DuPay, CSE 591A+, because that's the grade I definitely want in either class. Man, I got to stop putting class names in here. This will generate, when I click this submit, it's going to generate or request this URI. The browser automatically does this translation. And it's going to make a get request to example.com, grade submit, student equals Adam, so you can see the space here got translated into a plus, and or Sam class, the plus here, the grade, the plus here, right? Why did this plus get it changed? Yes, because this is an actual plus, I'm not a space. Space reasons of the request, because translating a space to percent 20 is three characters, but replacing it with just a plus sign is only one character, maybe that's why they did it. That's a good question. I think the plus was actually a wrong one of that. I know IE used to automatically wouldn't use plus, it would change it to percent 20. Right, which is also valid, that's the problem. Yeah, I don't know, it's interesting. I know part of the reason is they wanted URLs to have no spaces in them explicitly, so that way you can write down a URL and give it to somebody. Right, without that if you have spaces, well how many spaces and all that, so at least that part I know of why they didn't want spaces, but I don't know what you can first. Okay, so we look at a post though, now here we have a post method, we have the same exact values of our form and our input field, the only thing that changes is the post. Now when I fill in this input field and I click submit query, now it's going to generate an HTTP request and this HTTP request is going to look like a post request to grade submit, it's going to set the host name as example.com and then it's going to set all these headers, it's going to tell it that it encoded this with the form URL encoding just like before but then why does it have to specify the content length here, so it's going to get a new line, end of headers and then it's going to be this thing that we just sent so why does it send the content like right, because we are using what under the hood what transport protocol are we using what is HTTP using TCP, right TCP is string based, all you do is send bytes the only way to really tell the other side you're done sending bytes is to kill the connection right, so that could be one thing, one strategy is well you just do this and you go until you get a connection close, right but or you would do the other way you go to do until you see a CRLF well then you can't ever have a CRLF in any of your HTTP responses or request bodies right so by specifying the content length now I can say the server knows, okay I need to parse exactly 68 bytes and that's going to be the body and the side effects here that this allows this actually allows us to keep the connection open to the server so we can make other requests using that same TCP connection so we can say, okay now I want to do this now I want to do this and I guess we can talk about it TCP slow start right, so TCP for connection it starts out with the number of packets you can send to each side at a very low value and then increases that over time depending on if any packets gets dropped this is why reusing sockets that are already in use is much much faster because you don't have to go through this slow start period so that also improves, this multiplexing improves the performance of the server but anyways so now we've learned basically all of the basic building block technologies that are used in the web and so actually very quickly in the life of the web it was kind of expanded from just documents view and people realized that hey actually the way the web was structured we can return dynamic responses right, we can return a response that changes based on their input right, we could actually maybe give people I mean, so this is actually a very powerful idea right, because we can run code depending on what their response is and as long as we output an HTML page then their browser or any browser will be able to access that software right, so when you develop like a C application right, or a Linux binary it only works when you compile it it only works on that operating system right, and on that physical hardware that you compile it on but if I say well actually as long as whatever program you write, as long as I can access that program through HTTP requests and as long as I get HTML responses then I don't care as the person running the browser I don't care what language it's written right, if you could write it in C or in assembly, write it in Erlang or Python or whatever you want and it will still work and actually the early web was designed specifically to allow this and one of the one of the kind of case studies of the early web was to allow people access to a database via the web right, so you want to be able to interact with the database and to allow a web application to do that to use the web protocols to do that and actually when you look at the early specification for what get and post mean you're just building a document storage all you need is get because get says a get request should not have the significance of taking an action other than retrieval right, this means the server should not do anything except for return some results right, so this means it should be safe what does item post mean? no, multiple get request will also return the same yeah, so it means the similar thing it means that I believe it means if you make the identical request multiple times is the same as running that request once right, in effect that means it shouldn't change anything over time but yeah, it means that if I make one get request to the identical URI if I make 100 of those requests that should always give me the same result whereas a post request says a post request specifically they said should be used to annotate existing resources, posting a message to a bulletin board news group mailing list and provides a block of data such as the result of submitting a form to a data handling process right, this is like already getting to the fact that we want applications to handle and process this data and extending a database through an append operation so the key I've been thinking I mean this is this is really what I did in my phd on web applications and web application security so I'm trying to think a lot about what is the core of a web application and so really the core to me is server side code dynamically creating an HTML response right, so there has to be some kind of code or something that's happening right to dynamically change and create this HTML response and so how does this differ from what do you think of when you think of a website if a term website yes, a static page like kind of like I think the original Yahoo was just a collection of links to various places right, I mean it was updated they would update that but we're going to be from here on out we're going to try to be very precise when we talk about these things so this is why I specifically always use the term web applications right, because it conveys that this is a dynamic application just like any program you're running like Outlook or whatever on your computer right, these are applications that expect input output, they just happen to use the web to provide those services whereas to me a website is something just stupid and dumb HTML based so we've looked at the HTTP protocol right so every right, when I make a request to the server right, I say hey, I want this resource I want this method and I'm speaking this version of HTML at HTTP and then maybe I tell you my user agent or something and then when I want to make another request right, I don't go hey, you remember me from two minutes ago? I just requested this thing now I want this image file no, we make a new request and we say hey I want to get request, I want this this image and I'm speaking HTTP 1.1 and here are my headers and maybe here's my user agent right and so the server knows the server sees all HTTP requests coming in HTTP so it knows the client's IP address at least externally and it knows basically the user agent when you use an application on your computer and you boot up Outlook every time you interact with it do you have to tell it who you are? no, why not? yeah, embedded where in the session or in Outlook, I don't know how to store that somewhere so if you think about it the application Outlook is storing some state the very first time you open it up it's a brand new time and I think the first window it shows is hey, you want to use a mail client so you need to tell me your credentials so I can go check your mail and then it stores that permanently so next time you close and open it it looks and it knows okay, you're this person you have these mail accounts imagine how horrible it would be if every time you open Outlook you have to type in your email information your username and password it would have to re-go download and sync your email and you close it down and everything goes away and you start it back up and so this is the key problem with the web from the server's perspective all it sees is a new request maybe it came from that same IP address maybe not maybe you moved your browser from wifi to your own hotspot it sees a brand new request it's kind of like has anybody ever seen that movie Momento, it's an older movie it's about a guy who loses his capability to form long-term memories this is like a lot of web servers it's completely every time you make a request it goes hey, how are you, how are you doing and you make another request it's like hey, how are you, how are you doing five seconds ago this is the key problem with the basic protocols of the web is there's no state attached to any of these requests and so maintaining state is actually one of the core parts and enabling technologies of how you actually build these web applications fundamentally, HTTP is a stateless protocol the request that your browser makes are not linked in any way to each other and because of that the server has to treat you like you're a brand new person every time it sees this request so do we need this can we write applications without this without that just using HTTP as is we haven't looked at anything about how to do that I mean it would be a terrible user experience if nothing else why? what do you mean? slow terrible user experience well just using the protocol you can't even do that even with HTTP because there's no way to link those two requests even the client IP address a lot of us actually are on the wireless, you're coming from the same IP address so remote servers see the ASU NAT IP address some unique identifier in each URL that you request actually that is one of the techniques that people did use at the start was the first time you interact with my application I'm going to embed every link on my page I'm going to embed some secret information or some random number so that way when you click I see from that request I see that random number I can look it up and I can see that yes it is this person this is that same person so yeah you can actually completely change the links on the page do we want stateless applications? what are some examples of applications that don't have state are they useful? rest api what is it? rest api api rest api rest api yeah but what's an example that's just a type of application video streaming yeah YouTube you could use YouTube without logging in would it skype? skype is tricky I don't know what I called a web application right even Google you could use Google without me you used to be able to use Google without logging in right you just say hey I want to search for this thing and it gives you results it doesn't need to know that you were the person who searched for that thing five weeks ago it does know that and so it can give you better search results but there are actually a lot you can do without state not every application has to have state but if you want to have user accounts or any kind of linking of users right to build these kind of rich applications you need some notion of state and really when we think about state we need to kind of link requests together right and say okay what request did you make in the past right we want to see are you a brand new person or did you make some requests in the past and the goal as somebody said right so we really want to make some kind of session right so that the server can say oh you're this person and you made these requests and I maybe created an account for you and so I can store your preferences just like Outlook does so we can allow us to do authentication right so it can allow us to have username and passwords it can actually allow us to have rich, full interactive applications that aren't just requesting data right they can be personalized to us so there's kind of three ways that I've been trying to achieve this right one way we talk about is embedding information in URLs another way is to have hidden fields in form so I didn't talk about it but the input fields on a form can the type can be hidden so it's not shown to the user so that way you can put the session value in there and the third way the way that actually is the most common that we're going to talk about is using cookies unfortunately we're not talking about cookies you can eat so I'm sorry I know it's before lunch I just got kind of hungry to talk about it so the idea is cookies were this one of those things that was developed by Netscape because they thought it would be cool it wasn't the committee that decided we needed this feature it was Netscape realized this was a problem for people developing e-commerce applications and so they developed with their browser technology and this is why we have this today the idea is cookies are state information that's passed between the web server and the user agent so basically the server initiates a session by asking the user agent hey, store this value some opaque value that means nothing to the user agent store this and the next time you make a request give that back to me and that way I know who you are and I can link those requests and either the server or the user agent can terminate the session this is part of cookies anybody ever clear their cookies to their computer or something? yeah it's kind of crazy but I still have to speed that and that actually doesn't work so 1987 was the first standardization attempt for cookies 2000 tried to standardize cookie 2.0 by consolidating all these different implementations there's a lot of weirdness here there's actually a great RFC in April 2011 if you're in any way interested in the web and the evolution and want to get some insight into here about cookies describes how they're actually used in the modern web and how different browsers implement them and what kind of things they mean because there's actually incredibly complicated surprisingly or maybe not and so cookies are named value pairs right? that was great named value pairs all over the place and so the server says includes in an HTTP response a set cookie header to ask the user agent hey please set this cookie and so when you contact me again send me back this cookie so it can set a cookie like user equals food then the user agent when it talks to that server again send a cookie header and put this value user equals food so it will send a value of cookie colon user equals food and servers can ask for multiple cookies to be set so I can say hey set these things like maybe I want your language preferences to be on there and this is where it gets confusing so the server can set attributes on cookies like it can specify the path so it can say on this domain that you're talking to me store the cookies but only for this path so only use cookies this specific cookie when you're talking on this path of the application the domain so cookies can be maybe valid for domains or subdomains right so I may want to set a cookie on google.com then you can access on drive.google.com when you talk to there expiration how long I want you to keep that cookie around for? HTTP only so do I want the cookie so I want that javascript we're not going to get to that yet but javascript should not be able to access this cookie value it should only be used in the HTTP request response secure only ever send this cookie over secure connections so only over HTTPS never send it over HTTP so let's look at an example and then we'll kind of go back and talk about this so back to this one using curl command line tool to access google.com it gave me set cookies of preference equals IB equals this thing expires this time path domain another set cookie and so we can see that this is the actual value of this and we can actually see they're embedding more values in the value of that cookie in this preference equals so this is the key in this preference and this is some okay value that means something to google right so you can see they're actually parsing this again by separating it with colons it looks like I don't know if any of you have been working at google and you want to tell me exactly what's going on here that would be cool we have the expires header we have the path the domain and yeah so this way it includes google.com drive.google.com all google subdomains this one has an HTTP only flag and the server can ask the client to delete the cookie by setting an expires date in the past the user should delete that cookie proxies are not supposed to cache cookie headers so why wouldn't we want proxies to cache cookie headers in responses unauthorized logins what do you mean unauthorized logins you can use a cookie to get logging into some other logins yeah we're not talking about logins or anything right now so just about sessions why would that be people using the same proxy yeah exactly if you have multiple people using the same proxy it goes back to the problem when we talk about using cookies for authentication but we have multiple people using one proxy they're all going to get the same cookie which then defeats the purpose of cookies to identify a session so they'll all have the same session cool alright let's stop here on Monday we will continue