 Let's get some husky stuff out of the way before we go back to the left, so let's see. Okay, so yeah, three things in Xzyra. Is that right? Okay, let's go. There will be a sign before. Oh. Oh. It's going to be fun. Let's do assignment three. I will probably have to change a knob and reduce the difficulty slightly because I think basically the truth is we have like four weeks to cram all this web stuff in, right? And project four is on web vulnerabilities. So there's a time gap there that we have to cover a lot of stuff. So some stuff on assignment four you may have to read ahead and read alternative stuff to learn about. So I'm probably going to reduce the number of levels that you have to solve to get 100% significantly. So it should be easier in that aspect, but it will be just as fun as all the other stuff and it will be insane. Okay, assignment four. That's good. So the next one is project. So for project, the goal of your project, so I slightly tweaked it for a little bit over the syllabus, but I think it'll be fun. So you're going to create some awesome automated tool that's going to help you and your team own and destroy everyone else in a, we're going to have a project CTF. So this is the goal of your project is to have an automated tool that either, that does something, something to help you in the game. I mean, obviously not like play cat videos all the time, something technical, right? So it should be either something, I'm thinking something in the order of like a network defense tool. So something that's like listening on all your sockets to see if any flags are going out or something, a binary protection tool. So something that is looking at, is able to actually augment binaries and give them additional protections, that kind of stuff. Or really anything else you can imagine, an automatic attack tool, automatic exploitation tool for any type of vulnerability that we've talked about so far. Pretty much everything's up for grabs. So because the project is tied to the CTF, I'm expanding the project teams. So I mean, minimum of 30 max of seven people, inclusive on both sides, right, in your group. It's up to you to determine your own group, figure it out, your whole week. Good, okay, cool. You guys have a huge own group? That was a week, right? So on a week, you're going to register your group. The TA will send out a link to distribute to the mailing list where you can register your groups and who's on your group. If you don't know people, we'll get everybody to look around. There are people here, talk to people, post on the mailing list, say, hey, I have an awesome, start with an awesome group name. That's what you need to start with. If you have a funny group name, people will come and join your group. And so to actually demonstrate the awesomeness of your project and to find out who the best hackers are in CSE 545, we'll be having a project capture-applied competition that will take place. And so the top 13 will receive extra credit on their project, exact amounts to be determined. The project CTF will take place Monday and May 1st from 4.15 to 6.45 p.m. So we will still have a final because that's going to prepare you and help you study a lot on the web vulnerabilities, which will help you in the project CTF. So we'll have the final, the Wednesday, the very last class. So Wednesday in class, we'll have the final exam. If you can't make that, we'll reschedule something, don't worry, it will be fine. So that will be what we'll do. So basically the very last day of class, we'll have an in-class final. And then during the final, we will have our project CTF wherever we may have, and you guys will all be explaining each other. I'll figure out a room situation because this is a really crappy room to have a CTF in because you can't group with your team. So we'll try to find another room. We may have to probably split us into multiple locations depending. It kind of depends on how many teams, right? So we have like a lot of teams of seven, we have around 20-ish teams. If we have a lot of teams of three, we have around 40-ish teams. So it would be a lot of fun, I promise. What's the format of the CTF? Good question. Show up on Wednesday. So on Wednesday, we're going to do two things. So A, I'm going to be out of town. I have to be in New York this week. So I'm going to record a lecture, so we're going to continue our web pace. So there will be a recorded lecture for you to watch. And to get you ready for the project CTF, come to class. Yegana is going to create a game network that's going to be exactly the same style of capture the flag as the project CTF will be. So come bring your laptop. It will be an interactive session. There'll be fake teams. You'll get to join a fake team and join the game network and try to back in and exploit things so you can understand how the game network actually functions. So this will help you plan your project of idea of what you want to do. It'll be also a good time to find other people to have a new project. What is? The question is for assignment four. For the levels, you did say that it would be a parallel right, but I don't know where. I did not say that. Well, I did not say that. I did not say it from the class yet. So I'll double check it as well. I did not mention that. But yes, assignment four, when you get to it, is similar to assignment three. We have to break levels. But here, they're all in parallel. You can break them in parallel, which is a little bit nicer. But they're also web, so it's different. It's a little bit harder. It's easier in some ways, harder in other ways. You know, all kinds. Cool. Any other project, final assignment related questions? This is the assignment number, at least. Yeah, I need to find out how much, it depends on how much we cover. I mean, I could release it, I don't know, really, if this week. But if you don't have enough knowledge to finish it, I don't want to let you see it. I mean, I can just do it so you guys can see what's out there. I'm not sure. Yeah, okay. We'll see. I'll try to get it done soon. All right. What's up? This is also not posted yet. We'll be posting tonight. You all, you're going to be in the class. You're going to have to be thinking of things. Well, everybody at home doesn't, so. Think of cool stuff. Okay. Back to HTML. So you notice we're going to pick up the pace. We're going to go off after you. And I went through and I cut everything that was absolutely non-essential. Unfortunately, there's a lot of things that are essential. So, hyperlinks. So what's the role of HTML? Was it? To do what? To an extent. Hey, folks, you can find out your project names later. And it also links to the other page. So it's like what makes the web. Yeah, so it's hypermedia, right? So we have links to other pages, right? It's a whole beautiful web document structure. It all boils down to the hyperlink. Without the hyperlink, you have no HTML. You don't have the h and html. There's no hypermedia. So an anchor tag, this is about A, the bullet A. The anchor tag is used to create a hyperlink. href is an attribute of the anchor tag that's used to provide the URI that your browser should go to when you click on that link. Text inside the anchor tag is actually what is underlined and what constitutes the hyperlink. So, in sample, here we have an anchor tag with an href attribute with the URI as the value of that attribute. And the text, so the browser will parse this to slant and show it as an example. I assume you've all seen a link like this before? Very weird of you. But I guess I would be interested. Okay, so basic html5 page, and this really constitutes everything. So the link are the main ways that we have to provide a hypertext to other pages. So basic html5 page has a dot type h. So this dot type, there's a different dot type. So this is a dot type dot type tag. It has the html attribute. There's other kinds of dot type tags to tell the browser how to parse this document. This means you need html5. I'm actually a standard web page, so use modern technologies. You need a html tag. You need a head tag. Inside that, you need a med tag that specifies the character set. So once again, we talked about this a little bit, the tricky thing, like how do you know what character set this document is made out of? You need a title, and you need a body, and you don't actually need an h, and you're died with an href, but hey, this shows you a nice example. And so when you get this, your browser, your user agent, can parse it, interpret it, and display it to the user. So if you look at it in the browser, our beautiful page, all that html, we have reduced to something that looks like this. And if, and the very cool thing is somebody using a completely different browser, like links, right? No GUI. Completely different browser. We'll see the same thing. We'll actually put on links in the browser. So, question is, just like in URIs, we wanted to use maybe the slash character, how would we be able to use the slash character as data? Percent encoding, exactly, right? Because slash helps special meaning. In html, what has special meaning? Was it? No, yeah, angle brackets, right? So all the angle brackets, equal signs, single quotes, double quotes, base, no. So the question is, how can I write, how can I create text that is an html tag? Right, so how can I create, like that slide I just had, if I want to say, here's how you create a basic html5 thing, how do I actually do that? Because if I include the tags directly, those will be interpreted by the html parsing engine and interpreted as new tags. So, again, we come to this issue of encoding. We have to have some way of encoding these characters. And of course, you can't have it the same way as URIs because that would be insanely standard and useful. There has to be a completely different way of doing things. So, it's called an html5 less than 5, so I usually use this term as entity encoding. It's also just called the character reference encoding. So the idea is there are three types. They both start with an ampersand and end with a semicolon. So, again, interesting kind of design decision here. So either name character references where there are a list of predefined names, decimal numeric character reference. So, again, this is different from URIs which use the hex value. This is decimal. So if we do dollar sign hash and then some numbers, that will be the unit code point of the symbol that we're using. Or we can use hex and decimal by adding an x. So we can use ampersand, hash, x, and then hex values, and then a semicolon. This is actually the root of a lot of critical vulnerabilities in the web, and we'll get into why later. But this is something that's actually very critical to understand is how and why do characters need to be encoded. So it's an example. So just like in percent encoding, the percent symbol is actually a special symbol, so it itself needs to be encoded. Similarly with the ampersand. It's used to start a character reference, therefore it must be encoded as well. So, again, there are three ways to do it. You can use the ampersand symbol, the ampersand symbol, amp, semicolon, or it can be amp, hash, 38, or amp, hash, x, 26. Questions? So three different ways to do it. That makes it even more annoying. Actually, the first way is insane. Like, to me, like, why do a pre-destruction list, like, I don't know. It seems to make it... I guess it was originally created to be easy to write, but you probably shouldn't be writing HTML by hand. You can also add a bunch of zeros. So you can do a bunch of zeros, because in Unicode you may need a lot of bytes to sesame exactly which character and exactly which code point you are referring to. The E on my last name was the thing. It's apparently an acute E. If you've ever wondered, 233 or E9. Super fun. So key point, why do we need the less than symbol to be encoded as a character reference? It's used to open the tags. Who decides that it opens a tag? Yes, the browser. Remember, I know some of you took my 340 class, right? The browser is essentially a parsing engine. It parses, right? It reads characters and if it sees a less than symbol it starts parsing a tag. So a browser fundamentally just gets raw bytes from the server and it has to parse it to understand the HTML content. And so if you mean... If the website author means, hey, at this location I want a less than symbol, I need to encode that. So I need to encode it as ampersand LT semicolon or any of these other ones. If not, it could be interpreted as a start of a tag, right? Which, as we'll see later, has actually kind of struck a consequence of this, and it's really bad. But for now we can realize this is the core problem here. Okay. So, the other way to kind of think about the web, so on the web you have your document. It has a bunch of hyperlinks to a bunch of other pages. Is that the only way you interact with the web? You just go around and click on links all day. What else do you do? All my videos, right? The main line of course got that. You fill out a form. So you write text into form fields and then you hit enter, click on submit button to try to submit that. So forms are actually the other important and critical way of how you interact with the website. But, here's the important thing. A lot of people mess this up in their minds when they think about forms. All of form is, is it, a form basically tells your browser how to make a new request to a different page. So it's essentially also a hyperlink that includes some of your content. So, it's a way to create essentially a complicated HTTP request. And so it's got all these cool things. You have check boxes, buttons, range controls, color pictures, figures. And so specifically, so the form element defines, hey, I want a form here. The action attribute on that element specifies what URI should receive this HTTP request. Of all possible URIs, again, this could be absolute or relative. To make this more confusing, the action attribute is optional. And so it's not very default to the current URI. The method attribute, so this is where it gets cool. The method attribute specifies the HTTP method to use in the request. So this is actually how you can tell the browser, hey, make a GET request or make a PUT request, not just a GET request. So it's something we didn't talk about yet, but when you click a link, it will always send a GET request to that URI. Whereas here, this allows you more control over what you want. Again, method is optional. It is not present. The default, what would you think? GET. So the default is good yet. Then each form tag will have several children tags that specify what shows up in the URI, but also how to build the HTTP request that the browser is going to generate. And so depending on the type of request, each of the input, so here input is specially formatted because it means literally the element called input, are transferred either to query URL parameters or into the HTTP request slide. And the difference is based on the method attribute. So GET, as we saw normally in GET, we can actually pass data in a GET request using the query part of the URI. So every input field should have a name attribute and that name will end up as the key in the name query and the value will be whatever you typed in. POST does similarly, but instead of changing the URI, it posts it in the body of the HTTP request. The difference here, data is sent as name value pairs, so it actually makes a lot of sense just like you would expect it to look. So for instance, if we have an input type, so a type allows the author to specify, how does this look? Is it a text field? Is it, I think the other type can be password, so a password field. Is it, I think you got hidden fields that aren't actually shown in the URI, all kinds of cool stuff here. So what's cool is I got a name. So this means it's going to generate, when I make this request, the name parameter, the name data will be put as whatever value I put in and here I can actually hard encode this value, so this is what the user will see as the default value of that text box. So the value attribute of this input tag, so that's why I did that. So the name is taken from there and the value is either the input tags value attribute, if the user didn't supply any input or it will be the user's supplied input, so anything you type in there will be sent. Cool. It's going to be an empty string in neither of our presence. You actually will get, if you submit a form without filling things in, you'll see that you'll send for each input field, there'll be a name equals, and there'll be nothing there, and there'll be an ampersand, and name two equals and name three equals. Cool. So all name value pairs in the form are encoded again. Now, you would think, well, it's in the URI, so it has to be URI encoded, right? It has to be percent encoded, which is true, so it encodes the name value pairs using percent encoding, except because nothing makes sense. That spaces are translated into pluses instead of percent 20. As soon as because it makes it shorter, instead of three characters, you're sending in one character, but this is why plus is actually a reserve symbol for URI encoding, because plus actually means either space, plus means space. Okay, and multiple name value pairs are separated by ampersand. So, for instance, if I make a form with an action of example.com.gray.submit, what method is this going to use? Get. Perfect. It's going to have an input, a name of student with some value bar, a name of class, a name of grade, and an input type of submit button. So an input type of submit will look like this. And if we look at this in a browser, we'll see three really ugly things next to each other. The student class grade, you can see I'm giving labels to these, so nothing looks pretty. It just looks too ugly, but it works. And just the same, even if it's pretty. So where does the bar come from? The value. The value. The value. So I can put in my name, I can put in an old class, I can put in a plus, and when I click submit, right, this is the key thing. You have all the information in order to say exactly what this HTTP request is that is made when I click submit. So when I click submit, it's going to make a request to a URI that's example.com.gray.submit, which is in the action. And then it's going to pass in all these values, student equals amp you pay, and class equals CSE by name one, and grade by name one, A. You'll notice the important thing here. Why is this plus and go in? Yeah, because it's an actual text plus, not a space, otherwise it would include it as a space. And the other key thing is the submit button actually gets submitted just the same. So we can actually see, so if I change it now with a method post, same thing, method post, right? So when I click this submit query, what URI is it going to make an HTTP request to? So it'll be example.com.gray.submit with nothing afterwards, right? It's using whatever that action is, and it's a post so it's going to put everything in the body, so you can see this. It'll make a post to slash grade slash submit. And we can see that on my HTTP request, I have a content length of 68, and here is where URI encode, it will encode all of my name value pairs there. Any questions? I don't know. The ease just must actually go over here to include it. So is there any post that you're going to include? I don't know. I think it's probably a browser-specific issue. I think, let's see, I believe this, I took from the browser so I clicked submit and then I saw what URL it was for, and then I copied that and pasted that in here. So the browser may be doing a unicode translation from here where you can see it to the actual request settings. But yeah, that would be, I think, yeah, I think it should definitely encode it. Yeah, because here it's up here to encode as well. So, you've now learned everything, everything that you need to learn to understand what's coming next. So this was a crash course in web and web technologies. There's obviously a huge, vast mountain of knowledge, or I think vast in the mountain product area. A vast ocean of knowledge that you can get lost in in web technologies. We didn't even talk about pristine style sheets or any of those other kind of technologies. Lots of interesting stuff out there, but we have to stick to the basics for this stuff. So what's the difference between a website and a web application? Websites have a GUI, web applications do not. That. Websites are just a bunch of web pages combined together, but then web applications have much more complex structures and architectures. And services and... Are they the same? Yes. Websites are the same. Does that have clear applications? Other applications? Really? Maybe a web application has to perform some sort of function as opposed to a website which might just display text. That's actually a pretty good way. So yeah, you're all touching on different aspects of it. So the first thing is I will argue strenuously that yes, there is a difference. A problem is that in colloquial, normal, non-technology speak, they're basically one and the same thing. Nobody makes this distinction between websites and web applications. But for us, you're a computer scientist, you need to use precise terms for the right thing. So to me, a website presents information. Here is this thing. It's a page and never changed. Maybe it changes when it's updated, but it's not changing in response to me. There's a web application, I think, of traditional applications moving to the web. So GUI is one aspect. So there's some in GUI in the sense that websites also have the interface, but it's not meant for you to necessarily interact with an application. It changes in response to your input. I think it's another good way to kind of talk about that. Actually, and what's really cool, it seems counterintuitive, because when you've been talking about the web, there's this collection of documents with all these links, and you make this web name, and you crawl it, but you can actually, the architecture of the web, just for these URIs, HTTP and HTML, is you can actually return dynamic content based on the request that you're getting. So this is actually something that they discovered early on, and the web was actually intentionally designed this way, as part of the way to allow organizations access to a database, by the way. Not only just read-only, but they interact with the users? Yes, I would say yes. If they take your input, you could have a read-only application that is also a read-only thing that is also a web application. It kind of depends on what you're reading. If you're just using Twitter and read-only mode, the thing is those tweets are all coming from somebody else on the website, so other users are interacting with it and changing our content. Hold that thought, we'll see some images, and then go home. Actually, if you read this spec, the very first spec of Get and Post, it says Get should not have significance of taking an action other than retrieval, so even building it to the basic protocols, it's saying when you make a Get request, this means you're not asking for anything to change, you just want something, which actually makes sense of why a link. When you click a link, you don't want some action to happen, you just want to get some new page. So it's safe and item-poded. What does item-poded mean? Yes, so you can send it zero to n times, and the result will still be the same. New version of Docker is available. Post, so a post specifically says and a teacher of existing resources post CMS to a bulletin board, blah, blah, blah, basically anything that's going to change the application in some way or change something. So that's what a post is for. I think of web applications really as this means there is some server-side code running to dynamically create an HTML response. And, oh, can I take it out? This one happens when I take it. So I was looking, so one of the examples I would think of website versus web applications, if you think of the original Yahoo, and you know what Yahoo was originally. A portal, what does that mean? Yeah, so Yahoo was just a collection of links. It was literally a hand-curated collection of, hey, here's different categories, like gaming, sports, whatever, and you click to that page and just have a list of other pages and links. So to me, that's like a quintessential website, right? It's literally just presenting you information and go there and you see whatever they want you to see, but it's not changing based on my interaction with it. Where some of my Google is very much an application, right? I type an aquarium and then I get a dynamically generated result page for me. Because we need to be precise when we're talking about web applications versus websites. So if I say it doesn't make sense to talk about cross-site scripting vulnerability in a website, your website doesn't do anything. Your website's not interesting fundamentally because it's just serving static HTML content. Whereas we'll see the vulnerabilities all come because there's this server-side code executing on a web application. Yeah? So if there's a blog page with the comment section, then it's a web application. Yes, I would say that most blogs are definitely web applications. The blog is actually like a website. It's statically generated HTML code. There are comments on some things, but that's usually my discus third-party blogging. So it's kind of boring to lie, but it's just, it's kind of like that internet versus web thing. Like you know technically there is a distinction there. When you're talking with your friends or your family, you can interchange them. When you're talking with a fellow of your scientist, you can keep those distinct. Similarly. Yeah? It depends on the action you're talking about. So if you're just clicking a link, I would say no, that doesn't really constitute an action. So if you did perform an action, your browser made an HTTP get request to that specific URI that was specified in the link. Real interaction usually comes from filling out forms. So you're making post requests. That's all changes with AJAX. We'll talk about that in a second. So I think I'm going to server-side code to dynamically create an HTML response. Part of the question is how to do that with the technologies that we've studied. Right? It seems clear like, okay, you're Google, you just get like the search term, you get the search term, you look at stuff, and then you return some HTML response. Right? But in an HTTP protocol, we've looked at so far every single request is distinct. What does that mean? Right? So every time, so I think I made this analogy, I'm going to go crazy. It's like somebody with amnesia, right? Every single request you see says, oh, new person, great, here's your response. Oh, new person, great, here's your response. All the client, all the server has is the client IP address. So that's what IP address is coming from. But if you use just the IP address, that would be bad for us right now in the ASU's network. Why? Externally all the requests come from one IP address. Yeah, so in the ASU, we're all knatted through one IP address. So we literally appear to the world as one IP address. So if you try to say, well, everybody's IP address is unique, and I can use that to fingerprint them to know who they are. It's not going to work very well. You're going to have a lot of problems. You can go to ipchicken.com right now. You'll see that you have the same IP address. So also the user agent. Can we trust the user agent? No. No. It's coming from the user. It's essentially untrusted. So we need some way in these web applications to maintain state. So we need some way to say, when you make a request, I don't go, hey, brand new user, great to see you. I say, oh, hey, Bob, it's great to see you. I saw you ordered all this stuff last week on Amazon. So let's assume I'm Amazon. It's like, I saw you ordered all these books. Like, here's some other books I think you'll like. So we can actually make an application that maintains the state between the requests. And this is hard because HTTP itself is a stateless protocol. Unlike, let's say TCP, right? TCP has the state of the connection. We know what data we've sent. We know who we're talking to. We know because we've done the theory of Handshake that we have this communication channel. But to write a web application, we want to link and maintain state. And to be able to essentially what we said, link requests together, right? So that I know you're the person who sent requests X, Y, and Z, and not request A, B, and C. And really, the high level goal is to create some kind of session so that the web application can say, aha, I know who you are. So it allows authentication, so authentication so we can know which users which. It allows for rich, full applications that really admit it. So you have to take yourself back because it's hard because you've probably been using web applications a lot, right? So just desktop applications, right? I mean, and the key question is how can we support these kind of desktop applications, right? Like an email client. How can we do an email client on a web application, right? Without having this stateful protocol. There are three ways to do this. One is embedding the information in the URLs. The idea is you make a request to me and remember, I'm executing some code on the server. So when you make a request, I generate an HTML response, but for every single link on that page I put a unique parameter that identifies you as the user who first made this request. And then you'll decide to click one of those links The other one is we can use hidden fields and forms to also do the similar thing to link you through your form request. The final one is cookies. So we're going to skip the first two because they're honestly used so rarely now. Let's talk about what would be a problem of embedding your session information into a URI. It's iterated through the URL. I guess someone else is using it. If it's a poorly generated information, like if I just put your user ID in there, then maybe I could iterate through all the user IDs. Let's say it's randomly generated though. Is it still safe? Why not? By who? Pay me a malicious scenario or a bad scenario. You can sniff the network. Anybody on the network talking about all kinds of ways to do that can sniff and get those links and click those links and they're just the same as you now. Don't have a service? How so? So if I create a code which hits the server with different token or access ID, so the server will think we have that many requests coming from different database sources and we might request a service. Maybe, but in practice that's hard to do because you don't have to leverage there on your attack. So everyone of your request you're basically enforcing the server to store something on its local state. It's usually not a very big number. So if you think about we'll talk about that in a second, but so long. If we can argument in the middle then we can completely control all content so we can send them to any way. So we can control everything. It's kind of a general attack scenario here. Have you ever sent somebody a link? Nobody's got any link and sometimes somebody ever? Could you imagine if you did that and then now they were logged in as you? Yeah, you'd be very careful with the Facebook photo links that you sent to friends. So yeah, this is why those two died out because essentially sharing the links is not possible. Yeah, it kind of it breaks this idea that URI is just something you can send somebody. So cookies. You've all heard of browser cookies. Do you actually know what they are? Which they should kind of clean them every now and then. Super bubbles in the program? Exactly. We'll talk about it. We're talking exactly what it is. So it won't be that long. Essentially they decided browser manufacturers. I think it was net state was the first one to introduce this idea of a cookie. But the cookie is essentially state information that's passed between a web server and a user agent. So when you go to a website the server initiates the start of the session and says hey, user agent, browser here's some cookie the next time you make an HTTP request to me send me back that cookie. That way I know it's you. And either the server or agent can essentially terminate the session so the server can just forget what cookie it sent or the browser the user can delete their cookies or anything like that. It's interesting to note that while trying to develop an e-commerce application they were like this is insane this lack of state we need to actually do this. There's an RFC there's like a super interesting history of RFCs about cookies where they like tried to standardize it and then were like this actually is not how things really are so they published a how this is an RFC that describes how cookies are actually used in browsers nowadays. So, idea of cookies are very simple. They're just named value pairs separated by an equal sign. And remember this is all in HTTP so it's actually just a header so the server includes a set cookie header in an HTTP response and this asks the user agent to please store some data for the server. So you get a set cookie user equals foo. So now the next time this browser makes an HTTP request to that server it will include a line that says an HTTP header that's cookie equals user equals foo. Or cookie colon user equals foo. Just like this. So the user agent's job is to remember this cookie so what are some potential problems here? You can disable cookies you'll have a bad time on the way. You can delete the cookies if that falls under the if anyone can delete you can read the cookies you can steal someone's cookies and log in with those cookies. Yeah, maybe we haven't got that yet. Can you modify the cookies? Yeah, you can modify the cookie we didn't talk about it but essentially every website you go to can put some data on your computer. You have to store it on your local machine so there have to be some limits to the size of data so the size and number of cookies each website can store is severely limited because you wouldn't want to go to a website and have it a lot like set I don't know. Gigabytes worth of cookies you're like no I'm trying to install this I need this. It's a very interesting way to upload computation from the client to a server and the server can ask for multiple cookies to be set with multiple set cookie headers so I can set all types of cookies and not only that the server can actually set several attributes of a cookie so there can be multiple attributes on a cookie they're each after these on the set cookie line after the set I call it separate by set I want so interesting things so I may want to as a server set a cookie but have it be valid only for a subsection of my application so I may want to restrict the cookie to only a certain path so this is the path attribute domain so do I want this cookie to be valid on all subdomains of my domain or just a specific subdomain why might that be useful each subdomain could belong to a different entity or could be untrusted so only certain subdomains and expires are max age so a way for the server to tell the client hey I'm not going to accept this cookie after this amount of time is only which is important so this specifies we haven't got to it yet but this specifies that this cookie is only available to JavaScript and so here it means that this cookie should only be set over a gdbs connection it should not be set over any gdbs connection which obviously makes sense because that goes back to all of the immediate instincts of smithing and stealing these cookies so when I occur a request to google.com back in the day I had these two set cookies it said a preference equals this ivfftm so remember the cookie is actually everything here so perf all the way to the first semicolon and so you can see google is doing something interesting here they're encoding like their own encoding scheme to put multiple key values inside this one thing separated by colons right and we can see the expires the path, the domain other interesting things they set one cookie, this NID this HTTP only but this other press it is not that's kind of interesting so the expires probably just missed it that's the cookies no longer good sorry, you don't get a lot of good things just stay up here the path is cache the domain is dot google.com which means to send it to all subdomains of google.com so this includes www.google.com, drag.google.com everything HTTP only we can delete cookies the server can ask for it to delete cookies ah, good point proxies, we talked about it on the web there's lots of proxies involved proxying traffic why should proxies not cache cookies as I think of all of us if ASU is using a proxy I go to some page first and then I go to let's say, foo.com foo.com says hey, set cookie some random value to me I start interacting with foo.com it cached that first response so now when you go to foo.com you request it the proxy says oh, I have a page for foo.com great, send it back I have the set cookie header with my random value now when you make an actual request foo.com you have my cookie and so the server has no way to distinguish between the two of us ok, so essentially so remember we have these policies of hey, I want you to expire this cookie and this is all kind of a best effort thing remember, the server fundamentally cannot trust the client to do anything so even though the server can set some expiration date for the cookies if the the user agent has absolutely no way to does not have to respect that at all so it's free to delete them at any time right, and so the server has to also check sure, yeah you just keep it, you just change to 30 you have complete control over your browser right, and with curl you can make arbitrary HTTP requests with netcat you can make completely arbitrary HTTP requests right, so if the user agents can do absolutely whatever they want so is there like an associated cookie ID with your cookie so that you can't do that what do you mean? like, if you can extend the expiration date of your cookie the server probably doesn't want that so if they assign IDs to the cookies so they would know what they would expect for that specific cookie info yes, so fundamentally the server must keep track of itself so we didn't talk about what makes a good cookie value yet but fundamentally it's the same thing so yeah, you need to the server needs to keep track of what cookies it's giving to you what's the best way to do it so you want to establish a session with somebody what's the best way to do it but how do you generate a cookie you got? create like a cryptographically secure random number make sure it's not already being used and then send it to them if it's a good cryptographically secure random number in a huge enough space you don't have to check but yes, so you generate some long random number you say this is your cookie and then you store that in the database that hey I created this cookie and you have to associate with the user right, so this cookie is associated with this user on this date and so every single time you have a request coming in with that cookie you need to look it up in your database and check hey, does this like has this cookie expired yet is it still valid and then you look up and say what user is this cookie associated with yes, so we'll talk about this so fundamentally the server must the server has to implement this idea of sessions so as we just talked about the best way to do this so the worst way to do this is to put your user ID in the cookie right because then I can just change my user ID to whatever user I want and I'm good so it should be random and unguessable session ID send it to the user agent as a cookie and on subsequent requests just like we said the session ID the server uses that session ID to index into a database to look up exactly what you want and it needs to be random right the session value should not be auto-incremented ID value so it's very similar to what you think about it to TTP sequence and acknowledgement numbers if you delete your cookie you will send nothing to the server so you'll send no cookie to the server so the server can't look up anything so then the server would resource your accessing the server would likely deny you access to that resource usually by redirecting you to the logging page and then it would say sorry you're not logged in you need to log in to access this and then it would, depending on how the web app works it may set a new cookie then but it would associate with you with an unauthenticated user right now or it would wait until you logged in and then it would authenticate you and then it would associate and set a cookie on your account that would be the same thing on a secure application it would say hey you're not logged in I don't know who you are get out of here so now we're going to talk about how do we actually write web applications because you need to know how to write web applications in order to understand the vulnerabilities associated with web applications you used to have to write your own web server basically you'd write just like you did for assignment one you would write a web server that would parse and listen for HTTP requests parse those requests and then depending on what the request was you'd run some special code and send back an HTTP response with a different HTML content what's the problem here hard to standardize what do you just standardize as long as you're speaking HTTP you don't need to standardize not everyone is going to have to write syntax for these two so we can have bugs because of the other every single person writing a website would have to write their own web server so I think more of what your point was is that you're reinventing the wheel everybody is having to do this HTTP parsing which could get an insane issue of compatibility if certain people didn't do it the right way and so as you know the HTTP 1.1 spec is fairly long so it was generally a good idea it was generally considered a good engineering practice to separate out these things and say ok we'll have a web server which is only concerned with reading and parsing HTTP requests and we'll set it away so that if that request needs to and we forward it into a web application and now you can develop a web application without ever knowing about or worrying about HTTP which is cool so now to kind of change our model now we have a client who makes an HTTP request to a web server and there's some web application code usually running on that server although it does not have to which is also very cool so the web server will somehow depend on the exact mechanism forward the request to that web application some code will execute and this is the critical thing that makes it different from a web site there is code running on this server that determines the HTTP response which usually includes HTTP. So brief overview of some of the technologies so we're going to skip over CGI even though I think CGI is super cool active server pages is the first fact we're going to look at it was first released in 96 the idea was to kind of mix in 8 text HTML tags and a way to execute script it also had server-side includes which were similar to pound include in C so it was a way to kind of modularize your application and it's a scripting language in essence everything was interpreted and executed at runtime so there was no compilation set involved here and it's actually a huge stuff like, actually I don't know maybe I'll go Windows 8 for at least 8 months so at least like 2019 or something like that it's going to be a long time without support oh there we are proof of that G October 26th, 2022 the full long time support for ASU so let's look at an example so the idea is everything in ASP everything between an angle bracket and a percent sign is ASP code and remember this is running on the server it is code running the idea is to generate an HTML output so this, what this does I mean completely honest I'm not an ASP expert by any stretch of the imagination but this code is not crazy you can reason through it and understand it so string name equals request that query string name so what does this likely do look up yet anything in the URI in the query part of the URI that has the name of capital name set that value to the string name if string name does not equal the empty string so this is like I forget what language to do this it's like a very small subset for not equal then now here comes the important thing now everything that is outside of these special tags in ASP it's the percent symbol it's the less than symbol and the percent everything outside of that is just interpreted as raw text output so this code if the string name is not equal to the empty string then we're going to output this welcome and then now here we're executing some code so now we're executing code that says response stop write string name so if the query string is not empty then write it out in bold a bold tag with welcome followed by whatever the string name was let's say hey you didn't provide your name right and this is outside of the special tag so it's just going to output as is and we will end that here question about ASP and programming ASP not like ASP.NET like actual ASP I think it's related to visual basic can anybody confirm just look like visual basic ish syntax I think so they moved ASP all the way to the .NET framework so you can actually if you want to do ASP you can code modern ASP you can do ASP.NET they even have cool frameworks now that are actually like a bunch more modern so you can code a nice web application in ASP but it's interesting to see the evolution here so now we can get to the basically the juggernaut in web programming so in web programming if you don't know PHP you basically don't know the web and that's just like a fundamental we'll see you look at why so this is the language I fully expect you to know and understand for this class so PHP is a recursive acronym it stands for PHP hypertext processor and it operates on the same principles as ASP as we saw so the idea is we want to embed basically embed a scripting language into our HTML pages it was originally released in 1995 so I'm going to give you how old PHP is and it was insane so it was released first by a person I don't remember his name to generate his own like his own like home website thing like it was this custom built software just for him to do his own website and he released it basically open source and so it's exploded since then so in 1998 PHP 3 adds PHP 3 approximately 10% of the web servers on the internet that's pretty incredible in 2001 it was PHP 4 2004 it was PHP 5 added support for objects kind of cool a little bit more modern but then you have to think wait a minute that means it did not have objects all the way up PHP 5.6 this is August 2014 so you see this exponential delay and releases you'll also notice there's no PHP 6 is there? did it actually release yet? 7.1 7.1 is that the one where they completely changed it? yeah so I doubt it will so we're going to focus 5.6 on the web you always have to deal with the old crappy technology so PHP popularity so this was a very cool graph you can see many sites and the crazy thing about this graph and the important thing is this is a logarithmic scale on the y-axis so here you think well from even in 2004 is almost 10 million hosting or using PHP and you keep going out these numbers are just insane and active sites almost 100 million PHP my theory is PHP made it so easy to create websites and I'm guilty of this too as I started programming PHP when I first was learning how to create websites and so that's actually great for the popularity of the web it helps the web grow we'll see from a security perspective this is a nightmare I have a theory but PHP is so easy to write such crappy, crappy code that has security vulnerabilities and ultimately it's often times not the developer's fault they don't realize that they're doing it they're kind of ignorant and not in a negative way just unaware of these problems what percentage do you think is ASP.NET or PHP? it's a significant amount but compared to PHP ASP is it doesn't even come close and then you gotta think about do you factor in the popularity of WordPress and Joomla and all the other is it Drupal? all these PHP applications that are just frighteningly enormous and numbers of installs across the web that's something even I've noticed doing some research is ASP.NET it's a lot less open source code compared to PHP so there's so much PHP put out there okay so PHP and the fundamentals of PHP each page is parsed and interpreted on each page request and to make this faster the PHP interpreter is actually embedded into your web server using modern PHP for a matchy it is a completely new language so whereas ASP.NET was basically on visual basic PHP is a new language so it has a scene like syntax so it tricks you into thinking you're coding in like the C style of things but it was fundamentally custom designed to build web applications and we'll see that it actually has a lot of things that make this easier and also I like to rag on PHP and it's more like ragging on your sibling it's like you love them, they've been around there your whole life you love them, you know all their flaws you know they can be a terrible person but often it's not their fault they grew organically over time features were added on I mean I think that breaks down my brother doesn't I guess he did technically grow organically but not organically as an example here's a super simple PHP code Hello World so just like ASP.NET just like ASP everything that's not in PHP's custom tags is just straight text output so all of this stuff is just going to be output to the browser as is no parsing, no not and so everything is less than simple the question mark PHP so it's the start of a PHP tag and this is the end of a PHP tag this means that so essentially this PHP program says I want you when this page is accessed or this program is executed to output all of this including spaces right up until this point then I want you to execute the PHP code echo pHelloWorld and then after that I want you to output all of that space all the way to the end here so fundamentally it will produce the HTML page that is all of this with this pHelloWorld there, questions? so it's not really dynamic in the sense that it's changing based on a request but there's still PHP code that's executing and every time when we would make this request PHP is dynamically typed right which actually can be awesome nobody contributed to me static language people dynamically typed which can definitely help develop in time it has strict variable substitutions dynamic includes and requires super globals variable variables has insane features like registered globals so I'm not going to get into dynamically typed somebody gave a brief explanation of what a dynamically typed language is what does that mean? exactly so the type of the variables are not statically defined so they're determined at runtime while they're executing this means you don't know whether something holds a string and into an object or what okay, string variable substitution in PHP so PHP has some built-in functions so the echo function just outputs whatever it is to the HTML output so this would be a simple string in single quotes so you can put strings in single quotes this would echo and output the string variable space do not space dollar sign expand space either so this would be output to the browser to be interpreted as HTML now I can set variables so PHP would be a dramatic shift from C but to be more I assume like pearl, I think pearl is this way too right? so variables all start with dollar sign so this sets up a variable called juice that is set to the string apple now this is where you have string variable substitution coming what is this going to output? you drink some apple juice so the dollar sign juice will be interpreted only within what type of quote? double quotes so only within double quotes will this variable dollar sign juice be substituted with the actual value of the juice variable it's arrays so arrays are basically dictionaries the way you think about them in other languages so you have dictionaries they can be zero based indexing you can use them like an array you can also use them as an arbitrary dictionary so this is an array that has I believe arrays zero will be apple, array one will be orange and then array bracket Kool-Aid will be purple and so you can say something like he drinks some apple juice so yeah exactly this dollar sign juice is bracket zero this will all be interpreted as say use the juices variable and access to zero index in that array he drinks some orange juice finally you can say he drinks some juices Kool-Aid one juice and you can also so this Kool-Aid one will be purple and you can also put all of that inside curly braces and then you can have like I think arbitrary complicated instructions here so you have to actually use single quotes here to be Kool-Aid one to specify that there's a lot of this specific element in the juices of an array cool why is this difficult? yeah there's a lot of different ways to do the same thing and B if you just look at an output or you look at any string with double quotes there can be variables substituted in any place in the string so if you want to know hey for this output I'm outputting some HTML content did any of this output come from a user that becomes a very difficult challenge to decide because of this issue so if you look at any string that was substituted it was made a lot easier with syntax coloring though possibly it depends it still depends on how complex your program is so London I believe to answer that question you have to look at from that output all program paths is it possible that any of these strings that make up this string could come from a user anyway but yes the next one you can definitely make it easier to see these things PHP has dynamic includes and requires so unlike C where you say to count include and you include a static fix location PHP's include and requires are dynamically are basically you can pass an arbitrary string to that was computed at runtime so for instance what's the difference PHP how many people have programmed PHP you guys should use something better I mean it's good that you started there but continue your education and learn other web languages with the difference between include and require lots of hands yeah if you try to require a script that doesn't exist it'll just stop whereas include all of them say oh it's not there but continue yeah so include will do a warning I absolutely need this so for instance here's something from the WordPress source so it's defining some global variable WP use themes we can see it's using required during a underscore file is a super global but I actually don't recall where this PHP is right now and then oh the other thing that's super annoying about PHP that I always react to that all the time stringing imagination is the dot operator not plus like in the same language so here you have stringing imagination so here this is and so the way just like in C so let me know how pound includes work in C yeah the preprocessor takes that file copies it and pastes it right into your code the exact same thing happens in PHP for includes and requires the problem is you looking at this code it's actually hard to tell exactly what file will be required here so even knowing what type of files are going to be included or required is difficult and you can see here this is crazy stuff so depending on if this WP did header is set it will do other stuff and require other things based on other files it'll require based on on constant value so to understand this look up what these values are see what they should be I understand WordPress is a super complicated program I think this has to do a lot with templating templating can be a lot easier so you have these language features that make it easier to create complex applications but make it much more difficult to understand what the heck is going on it's getting going