 Or somebody say no. All right. Okay. Hello, everyone. I assume everyone missed me terribly. I'm not looking, so I'm assuming you're all nodding along. Where is, is anybody on the Twitch and can confirm that we are, in fact? There we go. Okay, thank you. Make sure you don't get the audio. Okay. People are in here. Okay. So you, the chosen few who have decided to attend in class and online can, I'll give you, you can give input on the plan for the next two remaining modules. Yeah, we have web security and we're gonna do binary security. So we're gonna end with a bunch of hacking and exploitation. Here, I had this thought. I will throw it out there. Tell me if you like it or hate it. I think my idea was today, after class, sometime tonight I'll just relaunch both modules at once, web security and binary security. They'll both be due whenever the late policy is. So I think that's the very last day that it could possibly be the 10th, yeah. December 10th. Let's double check that so I don't like just straight lie to people. This information. Yeah, I try to avoid that whenever possible as an educator. Oh, that's not helpful. Yeah, so December 10th. So that way you'd have till the 10th to finish them. We obviously won't have any in-person lectures on binary security. I actually think we'll cover everything we need to for web security right now in today's lecture. So you should be well prepared for that. And the remaining classes will cover binary security and any remaining classes after that, we can cover some cool topics like cyber crime or other things. So, but that way if you wanna get started on the binary security stuff, you can watch the lectures that will be in there from prior semesters to get caught up on that material. Yeah. Well, in terms of actually letting both, like you decide now what you do on the 10th. Yeah. It would allow people more time after you do like the 10th part of parts. But I've got a feeling it's gonna end up like the last few days after mine is never gonna try to do the more on the 10th. Yeah, I don't know. Like do you wanna be adults with your time or not, I guess is the question. I can make it be due earlier, but I feel late as possible. Yeah. Okay, so you're all willing to be, to shoot yourselves in your own feet. Maybe shoot yourself in the foot, that's not good. Okay. I like that. Very adult mindset. Yeah, I'm not gonna. Yeah, I'm not gonna. We all look the last few years, but we're trying to manage it. Cool. At least I'm gonna wait until like study time maybe for part, I don't know. But you can do it all now. You can do it all before Thanksgiving. You'll know exactly what your grade is. I'll try to put, I won't make any promises because I don't know how fast this process is, but I'll try to load in all the thanks and meme extra credit stuff into the grade so you'll be able to see your current grades up to there. Yeah. The equal waiting, that's the only, they'll just, they're all equally waited. I didn't know how many we had so I didn't wanna mess with that weight, but it shouldn't change anything. It was more of the night than the great weeks. Correct. Late submission policy. So until the 10th, you can do, you can solve them late and those will be worth 50% for sure. Like you can go to recitation and get help on any stuff and they will happily help you. We'll have to talk about, we obviously don't have any recitations during finals weeks. This is why when you're doing your finals week stuff, I think a lot of the T's will actually be physically gone and stuff. So you're kind of on your own as you get out of the semester, but you'll have the discord online to ask questions. To get a grade, no. To get a grade, the only thing you need is your ASU ID to be linked. But yeah, so that's like we don't know, if we can't link everything, we don't know any of your activity on the discord. So whatever you're doing there is all just kind of, we can't see it and link it and give you credit for any of that stuff that happened. Also this date I think, unless co-breakers after here, how long does co-breaker go to? Good question. The last day for any credit related things is the 10th. Good question. Cool. Any other questions? You guys know there's like spiders and lizards out there. It's like a whole zoo along the corridor. And snakes, yeah, it's like absolutely crazy. So you don't even have to pay to go to the zoo. You go here as a student, you can just go look at them. Creepy. Okay, so then let's go to back to the web. So that's what I was thinking of, thinking of spiders and the web. So we've looked at the web, what aspects of the web have we looked at so far? Somebody remind me, yes. Yes. But now you have to answer my question. What did we learn about the web this semester? Had a high-tech communication even further back, somebody else had to talk over the web, using what, what was the protocol? HTTP, yes, use it every day, literally every single day, go to a website. Anybody go to a website today? Yeah? Cool. So now we're gonna look at how the web can break. But to do that, we need to first introduce some more concepts and peel back the layers a little bit. So when we looked at it, we looked at the, you have your web browser, your client, you make an HTTP request, and how did, and you got and understood that what server to talk to with the URL. Remember we talked about how do you parse a URL and turn that into an HTTP request. And the web server does something and send you back some HTML as the normal process. However, these result in very boring web pages, right? Very static websites that don't actually do anything. Now this was the original kind of idea of the web, but what we actually want to happen is some kind of dynamic functionality that interacts with our request and parses our request. So this is where usually actually behind the web server, there's some web application that's running some custom server side code that does something, interpret your request, can parse your get parameters, which are passed in where? Oh no, tests. No, I don't forgot it already, just staring at you all. Yeah, I don't know if that's true, but no, my question was, so let's broaden it up a little bit. What are the ways that a client can pass data to a web server or web application through an HTTP request? That's the method, so those are the methods, but technically you can use any type of method, yeah. Jason, Jason? Yeah, Jason where in the HTTP request though? That's more of a data format thing. In the body. In the body, so we can pass it in the body usually with a post request. What else? So we said headers on Twitch, true, but not a common one. Not one of the two I'm thinking of. Exactly, part of the URL, right? So those are part of the URL syntax. After the question mark is all the query parameters and we have our parameters in key equals value, and we're saying key equals value, right? And that's another way of passing and we can pass my name. Awesome, so the web application is running some code, inspects those values, sees what things you're trying to query, generates a custom response and HTTP response. This is usually in the form of the other part that we haven't talked about yet of the H here. So what's the H in HTTP? Hypertext, yeah, awesome. So now we're gonna look at hypertext. So HTML, the hypertext markup language is a very simple data format that's been used to create documents that have links to other documents. That's fundamentally what it is at the base level of how it was created. It was based on an original technology from 1986. What's super interesting is seeing how the HTML protocol has evolved. So you can go look up all these standards to see exactly what HTML looks like. We'll go through the basics, this is important. Part of what to me is so interesting about the web is you have all these different technologies. So way back when we talked about web basics, we talked about URLs, HTTP and HTML, but we didn't actually talk about HTML when we talked about two of those. Now we're completing that. So looking at this picture, you have your browser, which is running on somebody's machine, which the user clicks on a URL, which makes an HTTP request, which gets processed by some backend web application running who knows what code in whatever language they want, because it doesn't actually matter what language that server's written in, then passes the HTTP response back with an HTML payload, and that HTML tells the browser how to render that so that the user can click on different things to go to different HTML pages and make different HTTP requests. And this process repeats. But this is part of the reason why the web is so complicated. Actually the picture gets even worse. We'll talk about later how web applications use database to store information. So you add more technologies and each of these has a different syntax and everything. And the web is honestly like literally like a complicated mess of different tech. That's part of why I love it because you have to learn all these different things and we'll see a lot of the breakdown and the security problems come at the interactions between these layers. Okay, cool. But yeah, brief history. So HTML went through several revisions, but now it is a HTML5 is kind of, was the current standard. It actually went through this thing. Anybody familiar with XML? Anybody play with XML? Yeah, what's XML? Like what's it for? Yeah, so it's another kind of market language. What have you used it for, let's say? There's also one for like fields, both lines, both for resources on the machine. Ah, so yeah, it can be used for almost anything. It's just a data storage mechanism. You could do that technically in JSON. It's all just about different ways of storing data. The interesting thing is actually XML came from HTML. So HTML was created first, then they standardized XML and then they tried to make HTML like XML. So that's where this X HTML was came from. And everybody said that's really stupid. We're used to this HTML. We don't like these stupid differences, which I'll allude to later, but they kind of don't matter. And then finally, so you can see actually the super interesting thing here, like a lot of innovation between these 95, 97, 99, 2000, and then the next version was in 2014, right? And then actually 5.1, they kind of started developing further revisions of 5.0, but now HTML is considered a living standard. So they don't have specific versions. Part of that is this problem that the web application needs to generate some HTML page that needs to be parsed by a browser and shown to a user. So it requires this complex dance between people that want to build web applications and people that build browsers. Because if you say, hey, the HTML standard does this, but no browser supports it, then who cares? And it's kind of a useless feature. So it's pretty interesting. So the basic idea is really the M in HTML is from markup. So we're adding what technically a lot of people call tags. It's also technically, oh, I shouldn't change these slides. It's technically not the proper term, but that's okay. Basically to add meaning to text. So we're gonna mark up the text and add meaning to it. So a couple key concepts, a start tag looks like this. This is the tag foo. It starts with an open square bracket and a closed square bracket. A whole thing is considered the start tag. Then there can be text as much text as we want. Then it can be, there can be end tags. So end tags are just like this, but with a slash. So start curly or start angle brace, backslash foo. So this is said to close the start tag. So everything between there and the text is as we'll see this actually creates a tree document. Sometimes you don't have any text between the tags. So you can have a self-closing tag by adding a slash at the end. So this is functionally equivalent to having start tag bar, close tag bar with nothing in between. Void tags are tags that have no end. So IMG, so before we've been using foo and bar, IMG is an image tag. There's no closing for an image tag. And so it's called a void tag. This was actually the key distinction between HTML and XML. If you've ever done XML, anything you start with an open tag has to have a matching close tag. And that's what they tried to add to HTML and everyone revolted and said, this is stupid. Cause images are always void like this. They have no end. There's like an implicit end. Anyways, cool. So tags form a hierarchy and a tree structure. So here we can look at an HTML document. So we have HTML tags. And again, what these mean, you can look up in the standard. The standard says exactly how the browser should interpret these and show them. So this is like a typical HTML five page. So we have the HTML tags, we have head tags. Inside the head tags, we have a child tag. And inside there we have the text example. And then we also have the body tags. And inside there we have P tags that have I am the example text. Questions on this? So we can represent these as a tree. So this is like a sideways tree. So the root is the HTML tag. It has two children, head and body. Head has one child title. Title has one child, the text example. Body has one child, the P tag. The P tag has one child, I am the example text. Cool? Now, if we just had tags, we could probably get around be able to maybe be able to use these, but for instance, if we wanna have an image and include an image in our HTML page, we wanna be able to say where that image comes from. So we need another concepts and that's attributes. So tags can have attributes that provide metadata about the tag. So they live inside of the start tag after the tag name and before the ending closing brace. Again, as we'll talk about, we're gonna get to vulnerabilities and web applications that allow people to inject arbitrary HTML content in a page. And so the syntax is why this stuff is important. So you can do this in four different ways. So this is the tag foo has an attribute bar with no value. You can have foo bar equals baz. This is a tag named foo that has an attribute bar with a value baz. Same thing in single quotes and same thing in double quotes. These are the exact same thing and multiple attributes are separated by space. Cool. Okay. So we can put all this together to actually do hyperlinks. These are the things that you've seen all the time, right? Blue little underlined thing that you click on. So the anchor, the A and the anchor tag is used to create a hyperlink, href. I believe it's hypertext reference is what that stands for is where you provide the URI and the text inside of the anchor tag is the text of the link. So if you had a link, an HTML tag that looks like this. So at the A tag, href equals Google.com example, that will render by the browser into this thing that you've probably seen many times before that you know now you've maybe been trained to think, aha, I can click on that thing. What does it mean that this is purple and not blue? What is it? I visited it before, right? So browsers also have a way, this actually has been a way that websites have, it used to be a Flan browsers where a website could know what websites you visited because it would create a bunch of links to sites like Google.com or bad sites you don't want people to know you visit and would be able to ask the browser, what's the color of these links and would be able to use that color to say if you visited it or not. Pretty interesting privacy and web security implications there. And so we can throw this together, build a basic HTML5 page that you can go online and look at. This is just the key problem when you have all this tech that's been built up over time is browsers still want to run websites that were built in 1999 that were expected to run on internet explorer version six. And so by default, they will try to, if they don't know how to parse it, they'll try to parse it in a mode that supports older styles of HTML. So by setting this dock type, it was a special thing to communicate to the browser. No, no, I'm using the HTML5 standard. Don't worry, like we're doing some good stuff here. So we can have just like this, I use this meta tag to say my text file is in UTF-8. So it actually knows how to parse everything, but that's okay. And then I can have a body with an anchor tag and the cool thing is this exact thing then gets sent to a browser. The browser's responsibility, again, the important thing to remember, the browser only gets this stream of bytes, right? And it needs to parse this and understand what is this, what type of HTML document is this? What does the user want to do? Just like you all did in the access control modules with levels 17, 18, and 19 is you're parsing some input file and trying to derive some meaning from it, right? Different time because this is way more structured and easy to parse than maybe the random output that I gave you there, but the ideas are the same. And this is what you also will get into with 340. When you talk about 340, you'll talk about parsing these kinds of things. Cool, so the browser's responsibility is this. You can take the same website, open it up in a browser, it'll show something like this. You can also use many different types of browsers. So there's like the browser on your phone, which is different, but also you can have browsers, this is, what is the name of this? Links, I think? I usually use it in Emacs, I don't remember the exact name. But this is a text-based web browser, so it renders the HTML document in text, and you can navigate through it. I actually, hey man, now I'm worried I'm gonna sound super old, but I actually do use this sometimes when I'm in Emacs, so I'll have a text editor in one thing. Then you can open inside Emacs a browsing window to like some documentation of Python, and you can have it inside the Python window so I can search for it and do nice things there, huh? No ads. That's true, no ads. Yeah, so this is links, thank you for confirming. So you can do all that there. Now, there's a very specific thing like we talked about. So we talked about this with URLs. So this is how an HTML page looks. What are the special characters in the HTML language? Yeah, the angle brackets. That keeps me square brackets. I guess I should've meant angle brackets. I don't know what I was saying, but I was doing this. So, yeah. It looks like those would potentially be the only special characters, because everything that's like something that's HTML specific is already within those. So the program only needs to be able to recognize those particular characters to tell whether something is meant to be a part of the actual web page or what kind of component it is. Yeah, so for example, like if I wanted to write a web page that said, hey, here's how you create a hyperlink to the URL example.com. Use this code, bracket A, href equals, blah, blah, blah, this whole thing. But if I put that in the HTML source, the browser thinks that I want a tag called A with href and not text with the angle bracket. So there's just like with URLs, right? We had the same problem with ampersands, question marks, equal signs. So there's a way to include. So for HTML, the brackets are special. Also single quote and double quote can be special inside of an attribute, because they can specify the end or start of attributes. Ampersands and equal are, yeah, it's not. All right, ampersands is interesting. I know they do cause problems. I don't remember the specifics. Yeah, you need to escape them, but anyways. Because it's in the escape code. Oh, done, yeah, yeah, yeah, that's right. There's definitely a reason. Yeah, the reason is, as we'll see, the reason why. Cool, so just like any problem, like when we had, we talked about strings in our programming language, we have backslash end to specify a new line that backslash is an escape character, but now you need an escape character. How do you do the escape character? So this, all of these character references start with an ampersand and end with a semicolon. And so that's why the ampersand itself needs to be encoded specially. You can name things specifically. So there's a look of predefined names, but it's basically ampersand, either a name and semicolon or a, the normal one that you'll see is a decimal or hexadecimal. So you can reference it in hexadecimal or decimal. For sure, have an example. Oh yeah, this is, and again, as we talked about, this is actually the key to understanding specific type of web security problem. So we'll just look at an example. So ampersand amp, semicolon, this is the named entity of the ampersand character. So when the browser gets this and parses it and knows, aha, they're using, they want a textual ampersand to appear here, not an ampersand that's going to start some named entity. So this would be whatever 38 in decimal is. Anybody have an ASCII chart handy? So I don't have to look it up. Whatever 38 is in decimal, x26 would be hex26, whatever that is. And you can have leading zeros there. Oh, why don't I have these on these slides? Anybody know? Gonna make me look it up. Come on, Twitch, what are you good for if you can't look this stuff up for me? Yeah, exactly, you're all on the internet right now. I will look it up. It is the ampersand, they're all the same thing? That's very smart of me. If I do say so myself. Okay. Yay, all right, perfect. Yay, exactly like I thought. Okay, so these are all different ways of doing the same character encode. This actually is used, not this specific concept, but well, maybe. This idea is actually used to get around a lot of web application firewalls and stuff. If you're trying to exploit a vulnerability, oftentimes by using different types of encoding, the same payload, they'll detect maybe ampersand amp, but not these other ones, or they even may do all of them, but not with leading zeros. So there's all kinds of cool stuff you can get around there. Cool, this is the E on my last name. If you ever feel like using that for some reason. Cool, and we talked about. So the less the angle bracket has to be encoded whenever you wanna use it, because otherwise the parser doesn't know, are you actually trying to do a math equation, or are you trying to start a new tag, right? Think about it from the perspective of the browser's parser. It needs to know what you're actually trying to do. So this is why in a proper application, this will be encoded as ampersand LT for less than, so that's the angle bracket, which can also be in all these same lines. Okay, questions on HTML. These are like the bare basics of HTML, and note we didn't go into like style sheets or anything with other crazy stuff, like how to actually build the nice looking website, but this is the basics of how this stuff works. Questions, cool. Okay, so let's get to an example of how we're going to apparently rob a bank. Cool. So imagine this scenario, so now we have an example of, we have the red computer, the blue computer, the green computer. So we have the, and we have an HTTP request, and the bank is running some backend web server just like we talked about, right? So we talked to the initial web server and it's running some backend web application, and so we're saying, hey, bank, transfer, and we don't know exactly what's gonna happen. Again, this is the interesting thing and what I love about web security is we can try to infer what's gonna happen, but fundamentally the backend web server code or the web application code is what will guarantee what happens. We can try something like, hey, what if we try to access a URL called bank transfer and pass in the query parameters from blue to red amount, that's a pretty large amount, a million. Let's hope those are in dollars and not like cents or Pico dollars or something, or what's a currency that's very bad inflation wise? Maybe no, Zimbabwe, something. Anyways. Yeah, that's right, Venezuela, but I don't know if you can like transfer stuff in there. Anyways, whatever, doesn't matter. But the point is we can try making this request as red but the web server may block us. So why would the web server block us? Yeah, what's a theory? Yeah, you just raised your hand. You're on the hook. It's the, hey, you can still go from blue when you're not blue. Exactly, so it's maybe doing some internal checks, right? Maybe it's using, we talked about cookies, I think when we talk about web basics, right? So it's maybe looking at our cookies, validating it, saying, wait a minute, you're not blue, so you don't have the ability to transfer it back. It may, if it's bad, it may be using like our IP addresses, in which case we could use our intercepting communication and maybe spoof a request from the other IP address. Do you have other thoughts? Okay, then don't worry about it. But the question is, yeah. So one of the key aspects of web security that's very different from, and one of the principle tenants of web security is that, so what can red get blue to do here? So I can't, let's say I can't directly force blue to make any requests I want, right? Because let's say, I mean, I don't know. Maybe I could, if I knew who blue was, maybe I'd break into their account and I, or break into their computer, right? I knew their password, I log in, I make that request, right? I could potentially do it that way, but let's say I don't know that, right? So I maybe can't directly make them, but could I, well, could I trick them to click on a site that I control, right? Have you ever clicked a link in an email before? Never, you've never clicked on a link any time in an email. That's, yeah, it's a straight up lie. I don't believe it for a second. I click on links in emails all the time, right? Your friend ever send you something in iMessage on Discord, you click it, you go visit the thing, right? It's like trivial, you can, you see some posts on social media, you click either to social media or whatever. Anyways, the point is it's very trivial to get people to click on links. So I can convince blue, so one of the principles of web security is that I can convince somebody to click on a link of my choosing. And it should be safe, right? The web should be safe. I should be able to visit any website and not have that fundamentally alter the security of my system. So, but as it turns out, and we'll get into this a little bit more later, is we can actually convince by having blue access a webpage that our server that red controls, we can send specifically an HTML payload that causes blue to make a web request. Specifically, we can use the image tags, we look at the image tag, we can set the source of that to be green bank transfer. Yeah, so that to the green server, bank transfer from blue to red amount a million. And so what's the browser gonna do? It parses this just like anything, it goes, makes that request. Also includes the cookie, because it's making the request from blue to green. And will the bank allow this or deny this? I mean, hopefully it has some kind of two-factor thing that's so noted by it and something like that is happening. Yeah, but maybe, I don't know. It's not a great bank. We can, right? So, if we go based off on our previous things, it says, okay, now we can theorize one possibility that this bank web application may do, look at this cookie, validate is the cookie blue, does this match the from user? Great, they're logged in, they are the from user, they have access, so let's make this request. And then the bank says, great, okay, this happened. And so, we were able to trick and convince somebody to exploit, to make a web request on our behalf just by visiting our server. So, this is part of what we need to think about when we are approaching and thinking about web security, is that we can fundamentally get somebody to access any web server, including a web server of our choosing that has arbitrary HTML content on the page. This can make the client's browser do things like render the page. This is a super interesting attack here is called click-jacking. That's when you go to a page and it's like, hey, have you ever seen those things that are like, hey, play this game and win a free iPhone? And then like, oh yeah, you've seen that? So you go to the page, don't click on those, those are bad because depending on what browser you're using, what they do is they'll underlay a content of a Facebook page, like a follow link or something like that, something from another website. It used to be Facebook, it can be any kind of anything although there are defenses now, but anyways. They put some content that you wanna click on and they make you play a game where you're clicking on things and then right before you're gonna click, they swap. So the thing they want you to click on is on top of the ball or whatever. So you end up accidentally clicking on something from another website to like add a Facebook friend or follow something or something like that. So that's called the click-jacking. So yeah, crazy stuff. They can also, also like I mentioned, so I talked about that being able to leak information about what websites you visited. So they fixed that, a lot of those, they fixed like a website can't ask for colors from a thing if it's been visited or not, but they used to do very clever things like, I think they could change the font size of whatever if it was visited or not. So if it was visited, the link would be large, otherwise it would be small and then using JavaScript, they could query the size of things so they can figure things out that way. Anyways, they can do a whole lot of stuff to try to get information about you. They can also make additional HTTP requests to other servers as we saw using image tags. And fundamentally, the crazy thing is they can run arbitrary code. So part of what makes the web, so HTML is incredibly static. What makes the web dynamic and interesting is JavaScript, which is literally a programming language where other people have written code that runs in your browser on your system. It's like actually insane when you think about it. Like every website you visit is downloading random crappy JavaScript code. And it actually still is kind of a problem, but some bad websites or sometimes criminals would hack into a website, they would inject JavaScript code on that website so that when you visited it, it'd start mining cryptocurrencies for them. And so your CPU usage would go up and any stuff that they would find would get sent to you. Cool. And when we're thinking about servers, we can receive arbitrary data. So we can, as we'll talk about, like the data may, this is when we think about attack services, how do we attack things? We can access or modify database data, interact with the server or influence other web clients which we'll look at. Okay. Somebody's asking online, they're freaked out because I said, no, we have four modules to get through. So we're, don't worry, plenty of content. Four slides. Now we gotta get to, okay, I wanna build a web application. One of the most common ways of storing data on a web application is with SQL. So we talked about how does a web application, how does HTTP keep state? Recall you did this in one of the talking web modules. You had to make a request and keep state with that request, right? You had to make like 10 different requests. How did the web server know to link one request to the other? Louder? Cookies. Yes, are hopefully some people's favorite things. But if you think about how much data do you have on something like Facebook or Instagram or Twitter, right? A lot of data is stored there, much more so than could be stored in a cookie. So web applications want to store persistent state. So that way they may maintain information about you request to request. Otherwise it'd be very, very difficult to make any kind of real application. So the question is where to store that state? We could store it in memory, just of the web application. What would be a problem there? It is definitely not, well, it depends on the system. Not encrypted, but to get access, actually you can, did I talk about this? There was a paper a while back that showed for like a server or desktop system. So you know what like a memory module looks like? Like the dim. So they showed that you could actually get access to a system, unplug the power and use one of those air can blowers, turn it upside down so it outputs cool air. You like freeze the memory, take it out of one computer into another and enough data will have kept around in memory that you can still recover that. It's really crazy. But one of the big things is what happens after you reboot your computer? How come all your stuff's not there? Yeah, so memory is non-persistent, right? As soon as power's gone, memory gone. So not a useful place to store data that you want to last for a long time. The file system, we could use the file system for storage. Clean this up since I'm here. If I don't do it now. We could use the just like flat file system, store files into directories. We could use like we learned about the access control, the Unix access control permissions, of who can access what. We can use XML files to store data. Some applications actually do this. Or the most common thing is to actually use a database. And we're not gonna get into like the theory behind databases, how databases exist, what they do, all that kind of stuff. Take a database class if you're super interested in that. But we will understand why it's useful. One of the key things that offers is acid compliance. I think it's atomicity, consistency, integrity and durability. So this ensures that when you've committed something to the database or made a change, that it actually has changed. If you're doing things concurrently and querying and storing things that everything is done correctly, it's a really good guarantee on a database. Put the C students for concurrency. Can you check what the C and acid stands for? Yeah, what was it? Okay, thank you. I thought so and I was weird to see this here. You can run a database on another server so you can physically separate your machines, which can be very useful if there's any performance problems with one server and another to have the database on a separate machine. Connor, I'm looking at you. That can be very helpful. And you can easily scale. This allows you to have one database machine, several front-end web application machines that all connect to the same back-end database, all kinds of cool stuff. It adds yet another technology to the web application stack. So in addition to URLs, HTTP, HTML, JavaScript, now we have SQL that web applications. In addition to, we've been talking about the web application language that your back-end web app is written in. The most common, and I guess we're technically not using this here. I just like talking about it because this is the classic web application model, was what they called the LAMP stack. That was Linux, Apache, MySQL, and PHP. This is where PHP is actually one of the most used web application languages. Although a lot of that is WordPress installs, but the idea is you can swap out any of these things. You can run it on Windows. You don't have to use Apache. You can use Nginx as a web server, but you basically need some underlying OS, a front-end web server that's gonna parse the HTTP requests, some back-end web application like PHP, and some database like MySQL or Postgres, or you can even get away with SQL Lite for a lot of things. So I like using MySQL. We'll just talk about this really here. It's, oh, this is a good thing. So it's currently the second most used open source relational database. I guess I see a bug there. Anybody know what the first is of those ones that I mentioned? MySQL, Postgres, SQL Lite. I can guarantee you that you're using one right now. Mango DB, definitely not. I hope not. Mango doesn't offer any of the acid stuff, so it's already out. SQL Lite, why? I think it shifts with it right on board. Mac OS, your iPhones, all of your Android phones, every single one of those devices has SQL Lite, uses SQL Lite at the operating system level to store data. Yeah, it's absolutely crazy. All of your, I know on the iPhone, all of your messages are in iMessage or stored in a SQL Lite database. It's used in tons of places. So you think about that, that's literally billions of devices that this software is running on. It's absolutely nuts. Anyways, MySQL, I like, because I like history stuff, was released in 1995. So Sun, the company, eventually purchased MySQL for a billion dollars, which is pretty cool. And okay, so if I let enough history, where's my mouse? There we go. Let's get into what does SQL look like? So it is a special, again, another special purpose language to interact with something, right? It used to be, we saw HTML. And again, the exact same concepts will come up. Being able to alter and change a SQL query will have massive security problems. So the idea being, there's kind of four basic things you can do. You want to get data from the database. You're gonna select data. You can update data. So change data that's already stored in the database. You can insert new data into the database. You can delete things from the database. There are slight differences between SQL implementations. So view whichever one that can get up when you're doing complicated SQL injection stuff. But let's look at some examples. So we first can, we want to, so everything is table-based. So you have a database. It has several different tables. Each table has columns, pre-specified columns. And then you insert rows into there. You can select rows that match different things. That's how you get data in and out of there. So to create table, yeah. So we can do something like create table users that has the columns username and password. We can also get these types. They can be integers. They can be, and this helps the database store things in an efficient way. But again, we're not gonna get into that. Okay, so we can create a table. Now we have this table users. It has no rows, but it has two columns, username and password. We can insert things into the database. We can insert into users values admin-admin. So this would be inserting a new row into the database where the username is admin and the password is admin. If it was values foo bar, which one's the username and which one's the password? foo is the username, bar is the password, why? Exactly because of the order, yep. So there's an inherent order here where these map exactly has we created them, username, password. When we insert them, it's exactly the same username, password. Cool. And we can insert more ones. Ooh, nobody tried this on the Dojo. Actually that'd be great if you had a fake user with this name. So we can keep inserting and these will be different rows in the database. And we can keep going. I'm gonna add myself in here. Cool, oh no, but now we're gonna select on them. All right, I'll change it back. So creating things, creating an entire table. It has a certain structure with columns. Inserting data, same number of columns. Now we can do things like select information from there. So we can say, hey, select only these certain columns from this table. We can specifically select star, which will select all columns from the table. If we don't add a where, it will give us everything back or we can specify in the query specific, in the SQL specific language, how to specify these conditions. But this is the general idea of what we're trying to do here. So we can do things like here, select username password from users, will give us the results of all the usernames and all the passwords from this table. We can also, if we just want the usernames, just say hey, give us just the usernames from users. We can also say get everything and that way we don't have to specify in advance, which ones, we'll just get everything in the order that we want. We can also say things like hey, get everything from users or username equals admin. So this will turn us, if there exists a user, it will return just that row or if there were multiple usernames with admin, it'll return multiple rows or it'll be empty row of zero. Cool. We can even do things like, hey, select star from users where username equals admin and password equals password. Guess what, this is almost how authentication is done in almost every web application you operate on. We didn't get into it, but hopefully the web application is not storing your password in the database because we'll see any vulnerability on the system. Not any, but some vulnerabilities on the system can allow the attacker to leak the database. So usually they hash your password, but again, it's some kind of password check to check if this user exists or not. So this returns a row and the user gave you that username password, you know they know the username password. If it doesn't and it returns zero rows, then you say error invalid username or password. Questions on select, inserts, creates. And we can delete things. So we can delete from users where username equals kenic and that deletes that row from the database and that will delete all rows that match this query. This is one of the very important things. If you just did delete from users, that will delete every single row in your database and you better hope that you have backups because it does not back anything up by default. We may wanna update, so as our users wanna change passwords or do anything like that, we can say update table set as we'll see assignments where conditions, so we can do things like update users set password equals password 456 where username is Connor. So Connor says, oh no, the students found out my password. I need to change it from password 123, let's change it to password 456 which is way more secure. So that will go through every row that matches where username equals Connor. It will change the password to be password 456. All right, there's one more, two more things that we need to cover that are gonna be useful. This is like as you're writing applications, this is almost all you need to do for the basics. It gets very, very complicated of how to do things in a so that it's very efficient as databases grow to hundreds, thousands of gigabytes. What do you do? How do you make it efficient? To query the data. Not gonna cover any of that here. You'll have to like look into that stuff. But there are important things to cover when it comes to exploiting and extracting information. So another type of query that is basically when you want to union the results of two queries. So you wanna make two different queries but use the results for one. So this is an entire select statement, union with another select statement. So for instance, we could say, select username from users, union select passwords from users. And this would select all the user names and also all of the passwords from users into one. So normally, so why would a web application write a query like this? It's kind of a trick question. I don't think an application ever would. There'd be, it'd be silly. Like why would you want the list of user names and passwords but not linked together just in one giant list? Like what are you gonna do with that list? Go through it and I don't know. See if anybody has any bad words in their names or passwords, it just, it fundamentally doesn't make sense. But you may say, hey, on a page that's listing all of the user names on the site, you may have a page that says, hey, select username from users. Now, we'll get to it later, but let's say you as an attacker could control the query that comes after that. So you can't control anything, select username from users, but you can control what happens after there. Let's say I give you a magic wand. You may say, oh, what if I change it to a, instead of select username from users, I say, select username from users, union select password from users. Now in the list of all the user names, I get also all the passwords. You can then change this other select statement to be anything you want to select anything from the entire database and just leak out the whole database using this. We'll get more detailed in there, but I wanted to show you why this union is important for you to learn. Cool. And one of the ways to do that of what to leak, there's actually, so all of the specific query languages, sorry, not query languages, but all the specific database engines have ways to query like what are the tables? Because you may not from the outside know what the tables are. So this is a select table name from SQLite master. This is specific for SQLite. This will return all of the table names. So for instance, if you did here, select username from users, union select table name from like SQLite master, it would return admin Connor and then it would return users and any other tables, maybe flags or something like that. So this is actually how, this is a technique that when you're able to control and alter the SQL commands that get executed or the SQL queries that get executed, you can use this to actually steal all the data from the database. Okay, we can also drop not just, we can not just delete data, we can drop a whole table. So coming full circle, it's from creating a table to deleting it. We can drop the table users and then this arrow is super great. It's, why do you need access to the camera? Anyways, and then after this arrow is to nothingness because this table no longer exists, boom, all the tables gone, all of our users are gone. Questions on SQL, it's actually a fairly straightforward-ish language, but it has a lot of nuances. And as you get like deeper into web security and you start exploiting different types of web applications or there's things that prevent certain characters, there's all kinds of crazy stuff you can do. I wanted to go over the basics. So you have an understanding of SQL so that that way, when you're working on the modules and you come across a web challenge that clearly has some SQL, you can refer to this in this lecture so you can say, oh yeah, that's right, this is what this does, this is what this does. I guess I didn't stop. Questions? Now we get to the fun part. Injections. Okay, so all of these classes of vulnerabilities that we're gonna look at essentially follow a similar pattern where the web application wants to create either a SQL query or a bash command or a HTML page and does so by concatenating strings together. Everybody know how to concatenate strings? How do you do that in Python? How to concatenate strings, what is it? Yeah, the plus sign, not a trick, just the plus sign, right? And then the other language is like, see you can do it but it's more of a pain. You can do SturCat, you can do all kinds of stuff but fundamentally, web applications like to concatenate strings together to do a query and that string is passed to some other system to parse it just like when the web request, the HTTP response is sent back. That HTML page needs to be parsed by the browser. Similarly, when we make a SQL query, the SQL query that we send has to be parsed by the SQL engine and so if we as an attacker can control that parsing, we can get it to do something it's not supposed to do. So we'll start with the easiest case. This is a, so oftentimes web applications will want to use the underlying system so rather than reuse or reimplement or like so rather than figure out in Python how to give the date, I could just call the date command and send that response back with the web page. So we can do this in three different ways here. This is in C, C-ish but the concepts apply to any language. So let's look at the system command. I want to start me a server please. I will start a web security module I think because I'll show that off in a second, but for right now. So if we look at system, system passing a string, the system library function uses fork to create a child process that executes the shell command specified in command using Excel as follows slash bin SH, SH dash C and the command that we pass in. So what this means that anything we pass into system will is just like we passed it into the shell, right? Just as if we were using bin SH and type that command in ourself. So bin SH has to then parse our string whatever string we pass it to figure out which is the command to execute what other parameters to specify all that fun stuff. So this is exactly what happens. So this is the sequence of operations here. So we call system, system internally calls exec VE bin SH and passes in SH dash C date. So the whole entire string that got passed to system is passed to an argument here. Then bin SH has to parse that and then by parsing it looks up, okay, they're talking about date. I need to figure out the paths. Let me figure out where a date is. Oh, there's a date in users bin date. So now I actually execute it by calling exec VE user bin date and passing in date as the argument. And it will return this Thursday. Oh, January, I guess it's zero, date zero. I guess it's consistent, cool. Everyone understand system? Yeah. Yeah, so we are not passing. So we only pass into system. So this is the argument to system what system does under the hood is it calls exec VE bin SH and then it calls that with the arguments SH dash C date. So it's exactly, so if we have something that's like system, well, date like this, that's exactly the same as if we did bin SH dash C date. And we can actually watch this. Yeah, so we can see, should be able to see. Oh, it does clone instead of exec VE dash F. Oh, cause it has to follow it. There we go. Okay, exec VE. So we can see that it calls exec VE user bin date, but that is bin SH that's doing it. So that's what's happening under the hood. And that's the reason is, cause you want to be able, you can do things like with system. Let me look at the parameters of date. You can do things like, oh, I don't want to look at the format. You can specify the format, specify different parameters. And so with system, you can do system, date, whatever, foo, bar, like whatever, just like you would on the command line. So from a programming perspective, it's can be very nice. So let's look. So yeah, we can do things like we can use system to define environment variables. So this is saying when we execute the command date, set times the TZ environment variable equal to UTC. So we can look and see that like date uses this environment variable to figure out. So this changes and specifies here. And so we can see that this entire string is good past the SH-C as this argument. It again has to parse it. And now it executes date with this as our environment. And we can change that. We can change that UTC to MST. So this allows us a lot of control in executing date. We didn't have to create a date functionality to change based on time zones and blah, blah, blah, blah. I can specify the time zone in here and then specify it in here. So now because that's all happening by the date program and me as a programmer, I'm like, this is great. I can reuse this function that exists and this program that exists on Linux. And I know it'll work and it'll tell me the time specifically in MST. Now, the question is if this comes from the user, so if the string UTC or MST, if the attacker is able to control that, right, what can they do? So if let's think that like, let's say the, I know this isn't real code. So it's like we had some code that kind of looks like this. We have system, time zone equals to some argument zero that comes from the user and then space date. So again, just like I said earlier, string concatenation, we're concatenating strings together. We have the string TZ equals, we have the string space date, and then we have some argument that comes from the user. All of this is then gonna get passed to SH-C and executed just as if it was on the command line. So what things can we do in Bash or in the shell? So if you were sitting here at my shell right now, what would you do? Yeah, how? You could maybe send some kind of assistive command to like access certain files and file system. Yeah, you want me to type it on again? What do you want me to type in? Yeah, I'm definitely not doing that, but. Maybe like cat flag would be a thing that you would want to do. I don't have permissions to do that, but let's say you did, right? So I'm at, let's say, let's just try doing this here. So I have this blah. I can write anything here for blah. Perhaps a command to change the permissions on the flag. So if I did, wait, let's look at this. So if I did cat flag like this, what's gonna happen when I hit enter? What was it? Some kind of error. Some kind of error, what kind of error? Yeah, so let's think about what this is gonna do, right? This is going to set an environment variable tz equals the cat, and then it will try to execute slash flag passing at the parameter date. And it gets permission denied. Why? We looked at permissions that's not executable, right? That file is only readable and only by root, so we can't execute it or read it. So that didn't work, but why didn't that work? Because of the space, exactly. Because space here, the bash parser is using that space to delineate between the environment variables we wanna set here and the command that we actually wanna execute. So what if we did something like this? The double quotes around it, did that work? Well, it didn't cause an error. I should have asked that. Is it gonna cause an error? A good friend of the error. Yeah, but I don't like that, because that's pretty asterisk. I guess I could do it with an SH and all that. Oh, that is what I did before, right? Okay, yeah, let's do that. We're nesting things, but that's okay. Okay, okay. So we saw it, I was hoping it would show us, why doesn't it show us the 12 variables? All right, I think I do know how to fix this, but I'm not gonna bother. Anyways, okay, so let's execute this again. So we got here, it's telling us cat, it's super weird that this actually works, that it doesn't say that that's an error. So now what's happening is, now we're not executing this command, we're just setting an environment variable. Like if we do ENV here, we can see that this environment variable, tz is equal to now to the string cat space slash flag. So okay, we got rid of the space issue being included now, but what we really wanna do is execute this program, so luckily in bash, there's actually two ways to do this. The symbol and easy way is actually back ticks. So bash, when it parses, let's actually, can I, is this in here? Yeah, there we go. Wait, no double quotes. There we go. Command substitution. So I just did mansh, there's a whole thing, command substitution allows the output of a command to be substituted in place of the command name itself. Command substitution occurs when the command is enclosed as follows. So you can use dollar sign, start parentheses, the command, end parentheses, or back quoted versions. So you can use back tick, command back tick. And what the shell does is when it sees those, it takes that, executes that command and puts the resulting output where that command went. That's not what I wanna do. So why did we get this cat slash flag permission tonight? Did he still try to execute the flag, the command on something that doesn't have a permission? Yeah, so I actually don't have the permission to read this flag. So your whole reason here is terrible. But if I had, let's echo, oh, not into, I just wanna really hope that this doesn't exist, but why doesn't it show there? I was hoping it'd show there. The date showed it. Okay, that works. There we go. It's getting some output there. And you can actually, I guess if you only have the output here, you could get different parts. You could cat the file and then grep for different things. You could do honestly anything you want. You can do head to get the first end characters and read it by byte by byte. You could do all kinds of crazy stuff. But fundamentally here, because, and we'll go back to system, but when we call system on something like this, right, we have to be careful because now bash is parsing the attacker string as if we, the programmer wrote that. And so we get access to everything in here about command substitution. What was the call? So command substitution, arithmetic expansion, all kinds of crazy stuff. And so this is an example of showing this exact thing, right? So we do tz equals tick, who am I, tick, back tick, date. So this string gets passed to sh-c as this whole argument. The sh parses it actually executes first who am I. So we can see this who am I gets executed as root. And then that gets passed in the result of that output. It's passed in as root here. And so we see that output here just like we saw the hack of reading the content of that file. The other cool thing, let's go back here. So think about this. This is a incredibly important part of injections, all of these concepts. So we're able to control the content here after tz equals. Can we change what happens before our injection point? No, this is hard-coded part of the application, right? If we looked at this, my, where's my, right? I had system tz equals plus arg zero plus space date, right? Can you, if you control arg zero, can you ever control what happens at the start here at these tz equals? No, the program will always take your input and concatenate tz equals with your input. So you can't change what happens before it. Can you change what happens after it? No, similar logic. There's always stuff that will be appended to your input. So the cool thing for you as an attacker is to think about, okay, but what if I don't want that stuff after it? So if we look at man sh, why does it not say anything about comments and shell scripts? Yeah, I'm totally ruining my flow. Comments and, cool, there we go. Anyways, so you look up, I thought I hoped it would be in the manual, it's not, but you look up, yeah, this actually makes sense, but anyways, any line starting with a hash becomes a comment or anything that starts with a comment is part of, starts with a hash. So if I did LS, so back to this example, if I did, well, if I commented out this whole thing, this does nothing, but I can't do that. If I comment out here, that's weird because that's part of there. Anyways, if I comment out here, so now date never actually happens. So we can use this technique and prevent date. Oh, so yeah, here we're being even more clever. We are now saying, okay, I don't want to set any environment variables. I want to do TZ equals, and then everything in red here is what I'm injecting. Semicolon, what's the semicolon for? What's the semicolon mean in shell scripting? Yeah, the semicolon terminator causes the proceeding, and now that's not it. Yeah, so it's used to separate commands. So normally when we type in commands, we do LS and then LS-LA and who am I? Well, we can do who am I, LS, to execute two commands on one line. So it parses it based on those semicolts. Oh, shoot, right of time. Okay, all right, we'll pick back up on this on Monday.