 Morning everyone. Let's get started today. So we are All right, so we are going to continue on ahead with the next Area of Insecurity that we're gonna look at and specifically we're gonna look at web and web applications so a Lot so kind of part of what I want you to be thinking about when we start talking about this is how our How are web applications different from traditional? binary applications that we've been looking at What makes them unique? What makes them different? Why are they interesting? There's an eye of my own reasons, but I'm curious to know what your thoughts are and so Basically the plan is we're first gonna look at kind of an evolution of the web. We're gonna look at some core technologies We're gonna understand how web applications work and then we're gonna look at vulnerabilities vulnerability classes in those web applications and We'll also try to draw some parallels between the web application vulnerabilities We look at and the binary vulnerabilities that we also saw and also see that some of the things that can happen in here Actually also affect the network security problems that we talked about right so Okay, so the lead This is believe it or not an image of the very first web browser So this in the center. This is a picture of since it's welcome to the universe of hypertext right very grandiose name so this was in 1990 1991 So Tim Berners lead the author of the first world-wide web browser and server He worked at CERN. So what does CERN do? Does Yeah, the LHC right there the thing that's going to destroy the world and create a super black hole where the earth is and we're all going to be believe happening to existence So he was working there. He had a tech background and he realized man. It's really annoying to have to Because they had scientists coming and going to work on the project now work on the project And he's like man, it would be really nice to like have some central directory for Who people are what their phone numbers are how to get in contact with people so that we could connect to them So this is the very first website So this was the very first website on the worldwide web It's a wide area hyper media hyper media information retrieval initiative And we give universal access to a large universe of documents. It's pretty cool. We don't try to write like Tim And so this is Tim So Tim service that is a server now from I assume the Queen England. That's how those things go And so he was working at CERN. He proposed this idea of so at the time this hyper media Which we'll talk about what exactly that is these ideas that are even kind of bubbling up at the time There were previous hypertext systems and So in 1989 he his proposals accepted the people at CERN well, yes We think this is a good investment of your time resources work on that He created the first website at the end of 1990 and I'll say he wrote a book I'll be leaving the web. So this is if you're interested in the web and the origins of the web I highly recommend this book. It's a really good take on his experience what things get to do and why he thinks it kind of took off So What's the difference between the internet and the web? What is the internet? Let's start there It's a network of networks, right? It is the network of networks where you have independently on networks We're all connected to each other So using the internet or using the internet technologies that we saw right TCP IP UDP Right we can send a packet of information from one computer to any other computer on the internet So how does that differ from the web? Is the web the internet? So the web is associated with HTML and browsers. So is email the web and is email the internet? Yeah, so what layer so we talked about kind of a seven layer, so what layer is the email? At the application layer right so it's application level protocol It uses usually port 25 to do SMTP messages to send hosts. There's a whole protocol designed to do that, right? So what's the web? It's also just another application, right? It's just another application that uses the internet to do things right uses port 80 typically to do HTTP requests it uses port 443 to do HTTPS requests, which we won't really get into here So I do want you one of the things I want you to keep in your mind, right? Especially when you read like popular news articles Oftentimes people conflate these terms, right? We just talk about the web when really they're talking about the internet or they talk about the internet when they're talking about the web and so that that comes up a lot and it makes sense in popular Popular culture because people use the web, right? I mean they don't really think that they're using all these other protocols on all these layers, right? They don't care. They're just using their web browser. They're firing up a web browser and going to places Okay But how do we put the web in the web? So the design originally was envisioned. It was a way to share research results and informations at CERN Which kind of when you think about it it's kind of crazy that it became this worldwide phenomenon that all every single one of our phones Computers everything is using, right? and at the time Sir Berners-Lee I don't know if he likes to be called He combined kind of some multiple different emerging technologies of hypertext the internet TCPIP and so The hypertext, right, is text. It's essentially the links that you think about on the web page, right? Or just like when you read a research paper Right, there's references in those research paper Those references tell you where to find more information or where that idea came from, right? So you can think of that as a link to some other piece of information Right, so this idea of hypertext was around and he kind of combined this with wow the internet is actually pretty big and I can I actually don't exactly know what you would do on the internet probably from 89 like use net and email and other things and So this idea grew into the universal access to a large universe of documents He said hey wouldn't it be great if anybody could put a document out there And at any place in that document we have a link to another document That could live on that machine or on a different machine and that machine itself that document could have other links That would be clicked on and follow and that document would have other links, right? So you have this interconnected web of information have all these documents In this universe and easily accessible to everyone so There's three central questions that the architecture of the web this really drives everything even The freaking crazy Gmail Google Maps all these fancy web applications now all these three design questions Influenced the whole design of this technology How do we name a resource right? So we're just talking about documents How do I name a document and how do I name a document such that my? CV PDF is different from your CV PDF, right? We just can't have the same name Once I have a name, how do I then request that resource and then how does that other machine know? How to serve me that resource and give me that resource and finally how do we create hypertext? So how do we actually represent these links to other documents? So these are the three key technologies of the web that answer these three questions Right, so how do we name resources on the web? URLs yeah, exactly so So technically I'm I'll be completely honest I don't know that I remember the ultra fine distinction between your eyes and URLs I believe that you are eyes are more general and they can be applied to different types of protocols and different types of resources Whereas you are L's are more specific to HG like HGP you are L's And then so we've already actually kind of done a little bit of a homework assignment on this So how do we request and reserve this serve a resource? What? What's the protocol? HTTP right hypertext transport protocol Right, and we can see how these are all related, right because really can you use HTTP to fetch something that is not hypertext? Yeah, I don't know the PDF documents. The PDF document doesn't necessarily have hypertext Or plain text file or an image a PNG file So at this point it's kind of a misnomer, but you can see the historical context We need some way to name things which is you are eyes URLs we need some way to fetch those things and then we need some way to Represent that hypertext document, so that's where the HTML comes in and so these actually form these three beautiful connections So in the URI right when you have a URI or a URL Right that tells you all the information you need in order to make an HTTP request Right, so you can look at any URI URL and you'll know exactly what HTTP request This is going to make if you want to fetch that resource, so the URI describes the resource and how to get it HTTP describes how to actually make that request and then when you make that request you get back an HTML document and What are the links on that HTML document map? Yeah, more URIs URLs right and those could be on that same server. They could be on other servers, right? So it's this cycle right this is when you think of when you're browsing the web This is exactly what you're doing right you're starting on one page Starting on one link you click that page you get a whole new page with new links You click one of those links you get a whole new page with new links, right your browser is continually doing this cycle So this is how these three technologies are all related to each other, so URLs can create HTTP requests HTTP requests typically result in HTML content. The HTML content gives you more URLs Okay, so now we're gonna look at each of these in turn, so the URI is As we said right it's essential metadata to describe How to reach and find the resource So it tries to answer the following questions which server has this resource that I want right who has it how do I ask for it and Then how can the server locate that resource? Right, so this needs to have enough information so that I know okay if I want to fetch So on the HTML page on your server I need to know how to talk to your server I need to know what purpose all talk to your server and I need to know that there has to be enough information for the server to Locate what I'm talking about So RFC 3986 has the latest as of January 2005 definition of what exactly the URI it is And we'll see that it's incredibly general so this should be very familiar with what you're What you've been looking at with URLs right, so the syntax here is We have a scheme We have a colon we have an authority. We have a slash. We have a path Followed by a question mark all about it quick my query and a hash followed by a fragment So is this look kind of familiar? Yes Doesn't describe all the URLs you've seen No Why no Yes, yeah, so there's stuff without question marks, right there's stuff that have no query no fragment, right? So, yeah, so some of these fields could be optional right and so So okay, so the scheme Right, so this is what makes you are a general so what's the scheme in normal HTTP URLs that you're Oh Right, so that tells the browser. Hey, this is how you fetch this resource is over HTTP Right. Have you seen other URIs that use a similar or that use a different scheme that aren't HTTP STP right you could send somebody an FTP URI right well HTTPS right it could be HTTPS and else Was that Yeah, I think there's an LDAP Is definitely a scheme any of the reports are saying our saying yeah, I think our sync uses this URI syntax as well Go for that's a really anybody actually use go for It was basically the Was it a precursor competitor to the web kind of I don't know if it came first I think it maybe did a little bit Mmm, okay, and it was like select one to go here select to go here, and that's how you browse the web Interesting so it's actually I wasn't gonna talk about it since we brought it up. There is So if you have a web application that will fetch urls as you give it Right, so like bit.ly or whatever right will go out and maybe fetch that URL or like Facebook will go fetch a URL to make sure There's no viruses or whatever if the code is not written correctly you can change this scheme to be a go for scheme you can actually Control a lot of what that server will send so you can actually use These schemes by these weird schemes to get the machine to scan other parts of the network like it's an internal network and give you the results Yeah, so it's I guess that's the only reason I know about go for I mean besides for an historical context Okay, the authority right So the authority is the entity that Controls how to interpret the rest of this right so basically this is where it's usually the server name So this is the server that this resource lives at right so it could be google.com HTTP colon slash slash google.com. It could be and the syntax here You can do a username at post colon port So this is why you can do google.com colon Whatever a different port to try to go to report number This is how you can try to access different ports of the default HTTP 80, but the important thing here is that What this path query and fragment means to you the user you don't care, right? You just know that this defines some resource and It's up to this authority this server to understand based on this URI what resource you were talking about So in the path it is usually a path name Separated by slashes right just like the Unix system, right? The query is used to pass data. So non hierarchical means not like the path, right? So this is Usually key value pairs HTTPS, but it doesn't have to be And what is this fragment so maybe seen fragment stuff What is it for what have you seen it what context have you seen it in? I think that gets your fragments. Yes, completely different though. This is in URIs Yeah, so it actually has nothing to do with the server at all, right? So this is usually identified a subsection or a sub resource of this resource, right? So you ask for the server for everything up to the fragment and then when your client gets that back It decides where to go or what part of that document you meant by the fragment So this is when I send you a link with a fragment in it Your browser will usually take you Directly to that element that has that specific ID. So that's the HPV standard Okay, so some examples so we can have something like food colon slash slash example comm colon 8042 slash over slash there question mark testing with bar And hash nose Right, so which would be the scheme authority path query fragment of this what's the scheme? who the authority Example comm yeah colon 8042 right the important is part of that exactly so the path would be over there and the Query would be test equals bar and on the fragment would be nose So we have an FTP. So this is an FTP URI right same thing Also, you've seen mail to links, right? So this is a scheme mail to and the authority is a host name at asu.edu All right, and I can have an HTTPS colon slash slash example comm Right, so kind of is this about URI the colon one dot HTML. So what's the problem here? Yeah, so is the problem is how to parse this right does this mean that I'm talking about is this whole thing the authority up to this slash Right is this the host example comm slash test slash example colon one is then the port and Then slash Adam is the path and then even if we did that right Where does the path end and the query begin here? Yeah, does it begin at the question mark? So everything after the question mark is a query, but I have a slash here So maybe that's part of the path Maybe my path is slash test slash example colon one dot HTML question mark slash Adam Maybe this whole thing is the path Right every sale. There's multiple ways to interpret this. So we have a problem So this leads us to the reserved characters So these are the whole set of characters that you can't have in a URI And so What do we do just never use these in our URIs Yeah, so we didn't code them somehow right we'd have some way to tell the server hey I meant a Colon character in the path not colon as a port right or I meant slash as part of a query parameter Not slash as part of the path Right, so that's what we get to URIs one of the big issues here is percent in coding So anything that's not alphan americ a digit a Dash dot underscore or tilde you have to present percent in code and Percent encoding is very simple. I mean so it's it's pretty straightforward. So you have a percent sign Followed by the hexadecimal reference representation of that fight So slash some of you have seen from the binaries right the hex asky value is 2f And I only know that I also spend time looking at shellcode and that kind of stuff right so The URI encoding of the slash would be percent 2f Cool. So this means amber Sam gets transformed into percent 2 6 the percent sign How do we go back? The assy value of the sense I was it Yeah, maybe percent 25 right so that's how we represent an actual percent because why is the percent character special? Yeah, because it's used to start the URI URI encoding right so we have to actually encode that a space gets encoded as percent 20 and So on and so forth. So now we can fix this example Right, we can say example comm slash test slash Example colon one dot html and then dollar a question mark percent 2f add up Right. So now I know that this URI is going to be parsed correctly And I know exactly how to parse the path the query and everything from this URI That's that's Sweet. Okay. So URI so we've probably seen Right. We've seen URLs. We've seen some on a page Right, so they can either be absolute where they say hey, this is exactly the resource. I'm talking about use HTTPS Talk to example comm fetch and pass this path parameter Or it can specify a location relative to the current resource So it can say slash slash example comm slash example slash demo about html So this is going to be relative to the current network path. So whatever the current scheme is That's what that slash slash means. So whatever we're never you see this URI this link. It knows Reuse the current scheme. So if it's HTTP, it'll be HTTP colon slash example comm if it's HTTPS It'll be HTTPS example comm Same thing for like slash text slash HTML health. This will be relative to the current authority So reuse the authority Right dot dot slash dot slash people dot HTML. So it's just gonna be relative to Yeah, so relative to the current authority and pack, right? So this is gonna move us up, right? So What did we learn about that slash? Yeah Right, so this is part of these things whenever you see things like this that remind you of vulnerabilities We talked about it's probably also a problem here In this case context is important so it depends on What context we actually are in so what page are we on when we see one of these app relative URLs? But the important thing is your browser or whatever user agent you're using has to be able to take these relative URIs And turn them into something absolute Right because it needs to know where to go to fetch that new resource, right questions on URIs Sweet. Hopefully stuff you've already seen before sent you Using use the web a lot Cool, okay, so HTTP is that the layer that talks about how a web client can try to request a resource from our web server and It's based on TCP Doesn't have to be I guess not but it is So the default is TCP port 80 Version 1.0 is defined by an RFC in 1996 version 1.1 got updated in 1999 And version 2.0 right so people heard of HTTP 2.0 So I believe it's based on SPVY which was a protocol to help my Google to do a Whole lot of additional improvements to HTTP But it's I believe it's still under discussion It hasn't actually been finalized the HTTP 2.0 spec. Does anybody know how's it is that incorrect information? Be refuted by anybody with the phone, okay so HTTP Right the clients and we have servers so the server right as We saw we've written a little HTTP web server, right? So the server listens for incoming TCP connections on whatever port the client opens a TCP connection to the server and Then sends the request to the server through that TCP connection. Finally the server reads that request Figures out what resource the client wants and then sends a response containing that resource right, so So actually thinking about all the things we talked about till now you kind of should have a very good understanding of every single one of these steps Right so from listening to TCP connection from this opening of TCP connections You know exactly what packets the client's gonna send you even have an understanding of how those packets make it from one host To the other of traversing the local network and then how the hops go in IP, right? So this is powerful stuff, so you're learning exactly how everything works Okay, so we have a client who's running some web browser and we have some server, right? So the server makes an HTTP requests gets an HTTP response That's how we can think about things abstractly right, so we just think about okay, there's one client one server Make a request get a response. That's the bare bones HTTP In reality things are often more complicated, right, so the There may be a firewall in between the client and the server so ASU actually has not just a firewall but a intrusion prevention system that looks for traffic will drop traffic if it doesn't like it You could be talking to not an actual server But a proxy that actually will forward that connection back to the server So this could be for load balancing purposes It could be because the server lives somewhere else that you have no idea where it lives That proxy could have a cache, right? So we can store and save the data from the server so that way it's reducing bandwidth and load on the server The client also has a cache so sometimes when you request things it doesn't actually fetch So this is why if you've ever had to let me tell you to hard refresh like control R a page Forces your browser to not use the cache and to request the resources So what happens is your client usually goes to the firewall which goes to a proxy Which goes to a server and then generates that request finally gets sent back from the server to the proxy to the firewall back to you There's a lot of moving parts. So this is part of the reason why hd wide web applications are Interesting is they are very complicated. There's a lot of moving parts in here So you can actually have problems at kind of any place along here So for a request, right? We've seen a request consist of a method. So as we're getting to the get in the post The resource which is derived from the URI. So what resource are they trying to request? The protocol version. So why is the protocol version important? compatibility, what's that? Yeah, so that way we know what protocol we're talking, right? So I'm trying to talk to you at HTTP 2.0, but the server only supports to 1.0 Right, we should detect that by Seeing what protocol version we're using right that way I can tell you hey actually I don't support HTTP 1.1 I only support HTTP 1.0. We also have information about the client So this isn't strictly necessary, but this happens very frequently, right? So this is the user agent header that our client sends to tell the server Hey, this is kind of like this is the software that's accessing you And often and occasionally a body so on post requests, we'll see that the Post request gets encoded into the body that the client would send Okay, so the syntax for a request we have the start line followed by headers So any number of headers followed by a new line followed by the body So each line is separated by CRLF. Hopefully you learn that in the homework project, right? CRLF is very important. New line is not quite the same. It will mess up some things So headers are separated from the body by an empty line So you always have a start and then you keep trying to parse headers So every time you see a line you parse it as a header until you see a new line a line of just CRLF Then you know you're done with the headers and you know that everything follows as the body so For specifying the method so this is the method so This kind of does get into rest although the rest is a lot later, but the idea is HTTP is general enough to allow Different actions to be taken on a resource right So common methods include get so we just want to get give me whatever entity is referred to by this URI right give it to me So one of the important things in here so the server make any changes to that resource based on this request Does this affect the state of the server in any way? No, right? It should be just a get just give me that resource. Don't do anything else. Don't change the state of your server Just give me that resource Whereas post basically says hey, I'm gonna send you some data in my body and You're gonna figure out what to do with that in relation to this specific resource I'm talking about Right, so this specifically says hey make some changes or do something right what that something is is application dependent But it means the post means do something put is actually a Historical thing that's been used to say hey replace that resource with this content that I'm sending you in the body so You can actually so Maybe see a little bit of the historical evolution here In Tim Berners-Lee original idea, right? It's kind of like a Wikipedia style thing where like people could just You could change you wouldn't need to use a web server You would just have a server and then you could use your browser to edit that page Right, so you could get that page make changes and then put that those changes back They would be reflected on the server and actually that was how that first one browser was actually built and crammed was because He was was actually like that so you could not only it was not only a browser. It was also an editor as well But nowadays right we kind of realize hey giving everyone in the entire world access to be able to edit our resources It's probably not a great idea Right, so that's why there's a whole bunch of access control things that have to happen here to make sure you can actually do this Head is a method that is used. It's identical to a get except that you're not supposed to return a body So why would this be useful? Yeah, so that's a good one right so To see if the copy has been updated, right? Maybe there's additional header information that tells you when a last change That's a good one. What else? Some metadata why did Why did IP include like ICMP messages like pain Yeah, so debugging purposes right so that so this is kind of where head came about Originally, let's just say okay. Just give me that header So maybe I need to check if things exist that don't exist. I can do some sanity checking I can use this as a debugging tool to make sure that those firewalls or proxies along the way are not messing with anything Right any of my headers? So yeah all these reasons Okay, there's also a lot more methods. There's options method which Is By most web servers There's delete so just like we can get things post things put things We want to be able to delete things from our server. Oh Trace right Yes trace is really interesting. So this is another debugging method which Asks the server send me in the body everything that I sent you So it's kind of like a pain But the response you get is everything that you sent so this allows you also to debug Hey, I want to see what firewall proxies are changing things from the way to me to you Just from the client you can't debug that right? You can't see what the server so this option allows you to do that Connect is an option. So if you know you're using a proxy connect will Actually tell the proxy. Hey, I want to connect to this URI so that way if your browser is proxy aware we'll be able to do this and The dot-dot-dot here is not just because there are more that I didn't want to list although for sure there are It's actually that a web server can define arbitrary methods arbitrary extension methods So these are defined in the spec to be an HTTP server. You must support these things But you're free to extend this and do what you want Okay, so let's look at an example request So here we have on the first line. This is the status line Right and then we have a series of headers and then we have an empty body so on the status line we have Again, so you say that okay this the method is yet the resource that it's looking for is slash The protocol is HTTP 1.1 and then we have each of our headers So this header so the order of the headers to the order of matter does the order matter now So the or the headers can be in any order So we say this came from curl. I'm trying to access Google.com and Accept is a header that tells the server. Hey, these are the file types that I will support Right, so that you're actually doing some negotiation between the client and the server Oh, sorry, not file types. It's the content encoding So this would be if we support gzip encoding or any kind of other encoding So why do we need to specify the host here doesn't the server know who we're talking to right? We clicked on a link for Google.com. We resolved Google.com to an IP address We made a request to that IP address on this specific port. We're telling them we want slash So why do we just send a header that tells who we're trying to talk to? A server city So what would that mean from that description I guess that? The Right exactly so the IP address right so we can have multiple different authorities or domain names, right? I all have the same server that IP address that's hosting it. I would we want that though Should they all be unique? I mean Yeah, but we could give each of those share your virtual hopes their own IP address, right? Not originally not originally was I mean First back in what was that 99? Virtual posts weren't quite as common great exactly. Yes Right, so it goes back to this problem. How many IP addresses are there in IPv4 To the 32 theoretically right and then once you start locking off the local addresses all that kind of stuff And you have class and then you consider the fact that they gave out allocations To various companies organizations, right? So a lot of company not a lot, but some companies have a whole slash eight IP range so they have a lot of IP addresses that they're not using right now We've technically run out of IPv4 addresses or we're very close to running out Right, so now if you wanted every single domain name had to have a different IP address Right, that's a problem But this host header was actually added in HTTP 1.0 or 1.1. Sorry HTTP 1.0 did not have a host header Which is why you had to create different posts different IP addresses for every web server They didn't even make sense there you couldn't do virtual hosting because you needed something like this Or maybe you do something really weird where you put it in the URL itself, but then it's not really virtual host So this was the so the web originally didn't have this the protocols evolve when people realized Hey, it would be a good idea so that we could support all of these things So now modern requests look much more like this So we actually still have the same features that we're used to we have the status line We have a host we have accepting coding this sense I can accept requests in deflate or gzip. We have an accept Header that says hey, I'll all accept text HTML application whatever whatever all this stuff This queue means that this is their preference for what they would like to have so you can the preferences are supposed to be in This order, and I believe they're gonna tie or something it will Try to take the one that has the higher How about the queue stands for QoS My head the user agent that specifies the exact Model and version number of Chrome that made this request Right. It's actually kind of crazy if you think about I mean, we're not we don't really touch privacy in this class But if you think about all this information you're sending to every single website that you click on when you're browsing I don't know how many websites you think you visit like a day Does it depend on if you're doing a project or looking at a bunch of stack-overflow questions? I have constantly like 20 30 tabs open. I think sometimes you guys can see my browser So you're sending all this information and the IP address that you're coming from to every one of those servers So these are how we do requests. How do we do responses? Right So response once again has the protocol version right to make sure we're talking the same protocol. It has a status code right if we've seen 200 404 Right, there's a whole bunch of different status codes. There's a short reason that describes that status code and then similar thing with headers a new line and then body So the syntax we have a status line headers body each line is separated by CRLF headers are separated by a body with an empty line and It's almost the same overall structure as the request. The big difference is the status line. So let's look so okay So the status codes Right, so they're grouped into classes. So there's the 100 classes responses Which are actually very interesting. So these are just a Way to tell the client. Hey, I got your request, but I'm still processing it right, so this is a response, but it's not exactly a response to That was crazy I dropped with one hand and tried to catch it All right, I'm awake So 200 so these are maybe the response code that hopefully we're more used to because this is what we Are normal this is when everything's going good So the whole 200 class are all successful that we received the request We understood the request to be accepted the requests 300 level response codes are interesting. They're telling the client. Hey, I got your request But that resource that you wanted is somewhere else Right, so it's kind of like the original Mario right that princess they're looking for is in another castle So you have to go somewhere else as a pointer to say yes I understood your request But really you could go look here for that resource that you wanted and here it could be again a URI So it could be on that machine or a completely different machine 400 said means you made a mistake. So I didn't understand your request Right, so that's where 404 comes from. It means you tried to request something that doesn't exist 500 means I made a mistake Right, the server says a something broke Right, I don't know what happened, but something the problem. So it wasn't your fault. It's my fault So probably try again later So 200 response code 200 it's okay There's also 201 202 204 response code the 300 response codes are interesting 301 means moved permanently so this server can the client can actually store Whenever it tries to get this resource it can always go somewhere else. So it just tells the client you can store that in its cache 307 is a temporary redirect Which means hey, go get the resource over here, but next time you want this thing contact me So other ones 400 for a bad request 401 if you're unauthorized, so maybe you've seen this if you try to access a page and weren't locked in 403 means you're forbidden 404 is not found 500 means that there's internal server error. This is kind of the general when we've seen the problem and 501 to 5253 these are all different types of errors. So let's look at a request. So So now we have our example here of this get slash html 1.1 Use rated curl. So this is what we just saw So now let's look at the response. So the response that Google sends us is actually this huge response Right, but if we break it down at the top We have the status line and the very first thing on that status line tells us the protocol So this says this is an HTTP 1.1 response The response code is 200. This means everything was good and that short little status code from Google From Google tells us that everything is okay, which calms us and makes us Okay so some other things so This header the expires header. I believe and this cash control header What is expires of negative one means does that mean it Never expires are always expired. It's one of the two cash control private max age zero. So what would this do the cash? This is this thing that So how would you find this out if you wanted to understand more about this header the RFC? Yeah, I'm searching for man page the RFC is close I don't think the man page will tell you much the RFC will definitely tell you what this means and And So yeah, I don't know a hundred percent exact But it's basically telling the caches along the way as we saw right the proxy the browser cash Who can cash it and how long they can catch things for if they can? We have the content type so this tells our browser how to interpret the response Right HTTP hypertext transport protocol. So this says hey what I'm sending you is text HTML sorry Yeah, text HTML and it even tells us the character set in coding of our response, right? So that our browser can properly interpret that And so this is so this is the body. So there's a bunch of other things in here. There's some cross-eyed surfing prevention stuff some frame click jacking stuff and Then finally the body with the HTML of the page So another thing interesting thing that we're looking at really quickly is that HTTP has an authentication mechanism built-in So it's based off of this challenge response scheme so the basic idea is If you request some resource the server responds with a 401 that says hey you're on off of on unauthorized and it tells you as part of the header the schema that it wants you to use and The idea is you need to be able to so this is on the server then pops up that window That says hey filling your username and password and then you fill it in it sets this authorization header So when you get the reply, there's gonna be a header in a 401 message that says www-authenticate basic authentication for this realm of reserve docs Then the client retries the access including In the header a field containing a cookie composed of base 64 encoded username and password So how secure is base 64 encoding? What is base 64 encoding? Right so it represents it in the form I actually can't remember the entire character set but yes, it represents it in characters. Is it a cryptographic? encryption or hash or anything No, you can take this string and you can go back and figure out exactly what that password is Right, so you can crack this username password exactly from this and if you're using HTTP Right, what do we know about the data sent over a TCP connection? Clear text right now HTTP isn't doing anything. I mean TCP isn't performing any encryption or anything HTTP We're talking about it, but it's just using regular TCP. It's not encrypting anything So anybody on that path right can anybody on your router or on your open wireless network? Can see that packet that you send with this data in here and can immediately crack it just it's not crack It's decode your username password from there So it's kind of funny that HTTP often authentication is quote-quote built into HTTP But a it's terrible and be it provides a really bad user experience because if you know that pop up So HTTP 1 1 has a little bit better authentication scheme So you actually uses some cartography where we get a nonce We basically hash the username password the nonce value the HTTP method in the quest of URL But the problem here is the server has to have access to your plain text username and passwords right, which is Not always great, so okay. I want to finish up here So you can actually use most of all the techniques and tools that we talked about in Internet Insecurity to talk to analyze and look at your HTTP traffic You can use TCP dump to look at the traffic that's going on your system. You can use wire shark to do the same thing We use sniffers to collect traffic right so we can use sniffers on the network to collect collect all the port 80 there we Can configure servers to create logs I honestly when I'm doing web security stuff nine times eight times out of ten. I just use a straight browser Like I know a lot of browsers especially now with all of the developer tools that are built into them Have a lot of the information that you need so really most of what you need is a browser Proxies are really great to be able to analyze the traffic without having to modify anything So this is another technique that I use so client-side proxies are also really effective means of Intercepting and understanding what's going on So Firefox has extensions like live HTTP headers tampered data If you just open Chrome or Firefox and you open the developer tools and make a request You can see all the headers that were sent and all the headers that were received The one tool I will mention her proxy. It's a professional grade tool So it's it's a commercial tool. I think cost about $300 for a license, but it has a free Version that has just some modules disabled, but even the free version is super easy awesome Really cool to use You set up the per proxy you set up your browser to use per proxy And then it will pause and let you inspect every request your browser makes and every response that comes back It keeps track of all the requests and responses and then you can say, okay I want to make this request again, but manually change one of these values And so you can easily see exactly what's going on there So if you're interested in web security stuff, I highly recommend checking out her proxy. Well, and then we'll stop here We come back. Let's talk about HTML