 All right, it's welcome back to Computer Science that you want. This is lecture six. So today is essentially scary story day, where we scare you as best we can, technically, with all the bad things that can happen to you online and using the computer and so forth. And if time permits, we'll actually give you the defense mechanisms at the end of class. But we'll also segue or we'll also overflow into next week as well, since there's actually a whole lot of technological stuff and policy stuff that can very much affect what's happening with your data to you when you're online and so forth. So we thought we'd make one announcement first. So exams have all been graded. Most of them were given out last week. If you didn't yet get them, we'll have them at the end of class. For those of you tuning in online, we will circulate tomorrow or Wednesday an email with a link with which you can check, either how you did on the quiz or also what our records indicate you've done with the wiki and the blog. And speaking of the wiki and the blog, the next assignment will be released tomorrow via email. And it will cover both a little bit of last week's content with multimedia as well as this week for security. So you'll see it's co-mingling the two of them. So with that said, you go to Starbucks and you've got a laptop. And suppose you get on the internet there. How do you typically get on the internet in a place like Starbucks? OK, so wirelessly. And they usually have what's called, what is it called, to which you connect, a little review from a couple of weeks ago. OK, so Wi-Fi, they have some kind of wireless router. They have an access point. So there's a number of ways in which we can describe this thing. But on your computer, you go to the menu and you see that there's the Starbucks SSID, the name of the access point. So you try to connect. And what typically happens when you connect to a hotspot, as they're often called, at Starbucks, or any place like that, a hotel, an airport, or whatever. What happens next once you select that hotspot? Just in real terms, what's that? OK, what does that mean? How do I access the internet at that point? I've selected Starbucks or the like from the menu or Logan Wi-Fi. OK, good. So sometimes you'll have to authenticate yourself. So what a lot of these providers do commercially is that even though you're connected to their Wi-Fi and your laptop can now talk wirelessly to the other computers around you, that doesn't necessarily mean you can access the internet itself or sites beyond maybe Logan Airport's default home page. So they assume, and this is a reasonable assumption, that almost anyone who's getting online at an airport or Starbucks probably wants to use the web. Even though as we discussed weeks ago, there's innumerable other services you might want to use, whether it's Skype or instant messaging or other protocols or technologies. But they assume that most people will start with the web at least. So you pull up a web browser and you go to google.com, but you end up getting redirected from Google to starbucks.com, where you then have to pay with a credit card, maybe $9.95 for an hour's worth of internet access, or sometimes you're just whisked away and allowed to use the internet. And that's great. So certainly if you provide your contact information and credit card number, at least Starbucks knows who you are. But let's suppose that you get online either for free or you provide that other information, what then are the threats around you? So you're sitting in Starbucks. What are you probably doing on your laptop? Surfing the web. And specifically, what kinds of things do normal people tend to do? Job searching online. OK, job searching online. What else? OK, so email. Online newspapers. What's that? Banking, online banking, good one. Thought you said dating, but that's work still online dating. So they might be doing this as well. Logging into Facebook, MySpace, any number of websites. So here's where the story gets a little interesting and increasingly scary. So almost any website you have an account on that you log into does actually transmit your username and password to that website. But what should you typically look for in your browser to make sure that it's safe to provide your username and password? OK, so HTTPS. So most websites that have any kind of login mechanism, including the course's own website, when you log into the Wiki or blog, start with HTTPS. And this S literally denotes secure. It refers to a technology or protocol called SSL, secure sockets layer. And what this is is simply a technology that ensures that any data going from point A to point B and back is actually encrypted. And by encrypted, we mean really scrambled up using fairly sophisticated mathematical formulas that are very unlikely to be cracked by just some random person on the internet. But most websites that you visit do not begin with this URL. In fact, after you log into Facebook, you'll notice that the URL becomes devoid of this S. After you log into MySpace, even though it might use HTTPS at first, it then quickly goes back to HTTP. So why might that be? Why might a site do you think send you back to HTTP after you've authenticated? You've already entered in your credentials. And so finish the story. What's up? Did anything after that, the responsibility or? OK, so anything after that, they figure it's not really our responsibility. It's not that interesting anymore. There's really not as much damage you can do by just seeing what newspapers or articles people are reading online. And so they tend not to force you to go to HTTP thereafter. But there's actually a sort of curious reason. It's actually expensive or computationally expensive, at least, for most websites to keep you on HTTPS, because long story short, if you're using encryption and all of this fancy scrambling techniques, that simply takes more CPU cycles. If a visitor consumes more of your CPU cycles on your server, what's the implication? Bigger picture, then. Exactly. So fewer people can actually use your website at the same amount of time, which means, and here's the financial cost implication for popular websites, that means you need more servers and more hardware to actually support those users. Is that like secure? OK, so that's true. But actually the pushback is, and what you'll find is, most banks actually use HTTPS throughout your transactions with the site once you've logged in for reasons that we'll actually explain in just a moment. And so they're actually among the few entities out there on the internet that realize you know it. It's probably more expensive for us to actually have people's accounts compromised than to buy a few extra servers. And so banks, you'll find, also because they have to defend their reputations, tend to use HTTPS far more than other people. And actually Google, since that incident in China, which you may have read about where some Gmail accounts were compromised, and it led to this whole big uproar in China and Google, Google shortly thereafter, partly because of and maybe coincidentally, transitioned all Gmail users by default to HTTPS, even after they've logged in. And so there's a bit of an analogy for what David's talking about. Like why is this actually expensive? Well, you can imagine that if you just send a letter to somebody and it's just a regular letter and it's just written with a message on it, that the person will be able to receive it and read it and understand it. But if you decide to use a secret decoder ring or something like that, then that person will actually have to spend some time decoding the message. And it's not precisely that analogy, of course, because there is a bit more security behind encryption, but it is that same sort of idea. It just takes more time to be able to read the same message. And so they can then, for the same amount of servers or for the same expense that they have, serve fewer people, like what we were talking about before. So many websites, with the exception of banks, will use logins then for HTTPS and everything else in HTTP. But what is the problem with this? It may be less expensive for them, but there's a downside to this at the same time. And it is maybe kind of obvious. What was that? So people would be less apt to use it because it's less secure. That might be true, but I bet that most people look for the lock only when they're logging in and then they kind of just ignore, especially since the padlock tends to be relatively small in the browser. So I'm not sure that many people would actually notice this fact. But besides this, imagine that you've logged into a website, so you enter it in your username and password, and all of a sudden all of these details are being sent back to you, but it's not secure via secure connection via HTTPS. So the implication there is that now, whatever this data is, even if it's relatively innocuous, like say Facebook profile information or what have you, is now being sent back to you in the clear. It's not being sent in a secure fashion. And so this means that if there's somebody between you and the servers, and recall that all of these packets are going from router to router, and even right now, for example, like my laptop is connected wirelessly, and so it's signals are being transmitted via wifi through the air, that somebody might be able to then intercept these messages that are in the clear and be able to read them and understand them. And so this is not a good thing, particularly when you're using a wireless connection. You're probably at the most risk in a public area using an insecure wireless connection because all of this data is just being transmitted in the clear literally so that, I mean, you can't literally grab it and read it out of the air, but almost literally using some software on a computer can a person see all of the traffic that's being sent back and forth between the computer and the wireless router and be able to intercept all of those packets that represent these connections and then be able to understand what is going on in that particular session, be it a webpage that's going back and forth, you know, but you're making a request for a webpage with certain sensitive data, or maybe even an IM conversation, you're having an instant message conversation with somebody online, that data would be able to be intercepted by someone as well. And this is not a good thing, this is a very risky thing and something that we have to combat against by making sure that we do something with our data. So what might we be able to do to combat against this? So I'm online, I go to a website and I see some guy chuckling in the corner because he sees the websites that I'm visiting and he can see all of the information that I'm downloading off the internet, how can I protect myself from this? So antivirus software is, it fixes a different problem altogether. That usually is after you've downloaded an application, you run that application and it installs a virus onto your computer. So antivirus and anti-malware, anti-spyware, that's not really going to help here because that's dealing with the software itself and not really with the raw bits that are flying back and forth between your computer and the router. So other ideas, yeah. Great, right, exactly. So if you have a secure connection and perhaps most commonly you might be using HTTPS, then it's a lot more difficult for someone who's intercepting this data to be able to look inside of the packet and understand what's going on. Sure, he might still be able to intercept the packet and see what this data says, but it's going to be gibberish because he doesn't have the secret encoder ring to be able to decode the message that's being sent back and forth. And so there's a variety of other ways too. If you use WPA encryption, for example, on your router, that encrypts all of the traffic going from your computer to the router. And so someone that's just trying to look into the traffic that's happening in your wireless network then might not be able to look at your data. And another way, even more advanced, would be to use like a VPN connection because typically VPN connections are encrypted, if you recall. It's an encrypted connection from your computer to say Harvard or to whoever is actually hosting your VPN server. And then your data is at least encrypted between your computer, all of the data, including HTTP data, is encrypted from your computer all the way to Harvard servers, where it is then unencrypted and then sent on its way in its unencrypted fashion. So this is a big problem that we have to be able to combat with. And certainly in a coffee shop or in an open Wi-Fi network, we don't always have the ability to encrypt these networks because it's something that has to be done on the router level. To be able to change the setting to say, okay, I want to enable encryption on this particular router. That's not something that you can do necessarily from your computer unless you have full access to this router. And so it's another thing that you can do, oh my gosh, I just blanked on the sentence. Do you have that? You just what? I blanked on my sentence. Do you happen to have that software up and running yet? No, it's not working on here. Okay, all right. So you can use VPN, for example, to be able to encrypt all of the data from your computer through a so-called tunnel to say Harvard. But then the problem there is that this data isn't encrypted all the way from your computer all the way to its destination. It's just encrypted from your computer to Harvard servers. And then from there it is sent in an unencrypted fashion over to subsequent routers or to subsequent servers. Facebook time. So let's make this more concrete. So I'm not actually gonna log into a Facebook profile here but many of you are probably guilty of this already or at least sites like it. So when you log into a typical site like Facebook, you absolutely end up at HTTPS colon slash slash something at least for a moment. So even though at the top of this page here, I'm currently on HTTP, the thing at the top right of Facebook.com which is representative of a lot of popular websites is that thing up at top right called a form and forms have fields and this is something you yourself may experiment with when it comes time to develop websites in this course. But a form is simply something that takes user input, you hit submit and that data then gets submitted, so to speak, to a remote server. And if the site is well designed, that remote server's URL will begin with HTTPS. So even though you might not see it on the webpage that you visit initially, you can keep an eye on the browser for that little padlock or if you're particularly sophisticated or paranoid can do something much like we're about to do and look underneath the hood at what's going on between points A and B. So once you've logged into a website, you'll recall perhaps that your browser kind of stops doing anything. There's no more spinning icon. There's no more spinning globe because HTTP, the language that browsers and servers speak is what's called a stateless protocol. And by stateless, I mean yes, the browser talks to the server and opens up a connection. Yes, the server then responds with a webpage generally. But then after that, the web browser closes the connection because it's received exactly what it asked for. In other words, you go to CNN.com, hit Enter. Your request goes to the server. The server replies, but that's it. It doesn't want to keep talking to you. It's not like you're watching, typically, a movie which would involve streaming the data continuously to you when you've just visited a home page which does not have streaming video content. It's just content that's been sent to you and then you're left with it on your own. And so there's no state maintained, so to speak. But this seems to imply then that once the server, like Facebook, has sent you back your home page's profile after you've logged in, they now have forgotten about you. Because if I then try to click a link in Facebook, that would seem to suggest if I haven't remained connected to the server, that the server is going to have to ask me for what the next time. The password, right? Because if logging into a website is like opening a door, walking through it, getting your content, walking out and closing it, that is, the connection ceases, it feels as though you would have to do those same steps yet again to get the second piece of content. Well, clearly, this is not how the web works. Otherwise, all of us would be infuriated if we had to log in with a username and password to every damn page that we actually visit. So there are mechanisms that exist that have kind of gotten a bad rap over the years called cookies that ameliorate this potential downside. So what is a cookie, as you might understand it now, in the context of web browsers? Perfect. So a cookie is a little piece of information, usually a text file that's planted on your hard drive by some remote server. And it just contains some piece of information that the server decides it wanted to leave on your own computer. Now it's not a large amount of information. They can't just store secretly a video file on your computer. They usually just store something like your username or your preferences or more commonly and more effectively for security just a really big random number. They plant this really big random number on your hard drive. They remember what that big random number is on the server. And what browsers are designed to do these days is after a server has planted a cookie in your hard drive, every time you visit that same website again, facebook.com, your browser is supposed to automatically and sort of unbeknownst to you, the human, send that same cookie value, send that same big number again and again really is like a hand stamp when you're at some club or some amusement park saying, oh, I've already been through. Here's my big number. Go look me up to check who I am. So in this way, HTTP itself is stateless. It forgets about you once the connection closes. But you have this constant reminder on the top of your hand saying who you are. But this is a problem now, as Dan was saying, with environments like Starbucks or the like when you're not actually encrypting any of your data. Because even though your username and password might have been sent in encrypted form from browser to server and then the response comes back, well, thereafter your hand stamp is not, in fact, encrypted. It's part of what are called HTTP headers. And these are just generally uninteresting pieces of data that the browser sends to the server when requesting a web page. In fact, if you recall from lecture three or four, we revealed that the simple message that a browser sends is essentially get. And then, for instance, slash, give me the home page. And then the version of the language, the browser wants to speak. Remember this? So it turns out this was just one of the commands a browser sends to the server. Another one is cookie colon and then that really big number. Every time a browser requests information of a server, does send this. That was not a lie a few weeks ago. But it also sends some other stuff, including this line of text and this reminder as to what's on your browser's hand. So we can see that here. And I can't claim to know which cookie value Facebook actually uses for these purposes. We can only hypothesize. But this program here, this window here, is a sniffing utility of sorts. It's sniffed my traffic between my laptop and Facebook.com. And notice you can see things related to what we've looked at before. Here is my request. At top left, get slash HTTP slash 1.1 host Facebook.com. So again, there's some other stuff that's sent. But now notice what the server's reply is if I scroll down just a little bit. The next chunk of text here that starts with moved permanently for unrelated reasons has a highlighted line of text. That's related to this one. That is the act of the server planting that cookie on my computer. So if you've ever heard about web servers putting cookies on computers, it really reduces to something as trivial as this. A line of text sent over the internet from server to browser. Now my browser, if cookies are enabled, is supposed to remember this cookie by putting in a file somewhere on my computer. And in fact, the next time I visit Facebook, as events by the second example. So this is another second request from my browser to the server. What do you notice toward the bottom of this request? Just this, cookie colon. And it's that same really long number. So this is great. That's a reminder to the server, hey, I already logged in. This is the same person I am before. You can check your database to see if I'm in fact still logged in. But here is the danger. And this is actually very real. You go to Starbucks or the like. You log into Facebook using your free Wi-Fi or even your $9.95 Wi-Fi connection. But you don't actually have a secure connection to the access point. So all in your homes, probably if you took to heart lectures three or four, you probably turned on or already had turned on encryption on your wireless access points. WEP, not so good, but it exists. And WPA is the better cryptographic mechanism that we recommended back then. But you don't usually use this in public because how are you going to get on Starbucks if you don't know their password unless you go ask the barista who probably doesn't even know that value anyway. So they just don't use encryption. So that means your only defense up until now is encryption with the server directly, which you get with these URLs. But after Facebook decides, oh, this is not worth our money encrypting every little page you visit, you then go back to HTTP, which means all such details like these are sent, so to speak, in the clear. Now if you have some bad guy sitting in Starbucks with you or sitting outside, at least within the radius of that access point, and they just have way too much free time on their hands, there exist utilities. You can download them for free, which I did. Just couldn't get it to work on such short notice. On this laptop here, you can just sniff all of this traffic. So when Dan made this very clarifying gesture of this is not how it works, grabbing packets in the air, your computer can do that electronically. And it can then show you, the bad guy, all of the packets going back and forth, which means a bad guy can see what websites you're visiting if you're within radius of him. He can see what emails you've been sent, what emails you just sent, what instant messages you've received and just sent. Because all of this stuff, if it's not encrypted using protocols like this, are just sent in the clear. But worse yet, if that guy has enough sophistication and knows, you know what, things like cookies are actually pretty juicy, he can, so to speak, grab that cookie out of the air and then using free and not that complicated software, forge requests like this. He can configure his browser to send your cookie from his computer, which means if you've already logged in, he identifies himself with the same hand stamp, kind of like some of you might have done as kids, like making a copy of your hand stamp or something silly like that. Well, then the server is going to mistake him for you and show him your Facebook profile and all of your messages. And though I've been using Facebook just for the trendiness here, this is true of any website that uses cookies, which almost all of them do these days for logins, subsequent to authentication, that don't encrypt the entire session, like banks and now Gmail do. But this doesn't necessarily mean that even if you protect yourself by making sure all of your data is being encrypted from your computer through the server, et cetera, et cetera, that your information is necessarily safe on these machines because there exists a variety of attacks. And so, for example, a typical login screen might consist of a username entry and a password entry so that the person is supposed to enter in both a username and a password. But there exists these specialized types of attacks that many hackers might employ. Some of them are called like SQL injection attacks, and there's some other ones that could be pretty tricky, where a person can actually just type in a password. Doesn't even have to be your password, and if the website itself is not properly written, is not properly configured, then that person will be able to gain access to your account. And so this can be a really tricky thing, and this goes beyond even all of these problems that we've been talking about before or that we will mention even in the future, like phishing or some other method, like literally a bad guy just guessing your username and password. This person might be able to just log in with your information just using a specialized username and a specialized password. And unfortunately, there's no sort of fix for us as users of a website for this particular problem, because it exists purely on their end. It exists purely on the servers, and it's something that we have to... I mean, it's nothing that we can even really combat against. If they have this problem, and it's actually a huge problem on the internet, it's rated... Let's see, there was a recent report that came out that listed the top 20 security threats in the world relating to computers and electronics, and I think SQL injection was like number three or number two, something abnormally high. And so what SQL injection actually does is sort of beyond the scope of this, but just realize that there's this sort of special password that a person can type in and be able to gain access to a particular place. And in fact, you don't even have to be doing something malicious to be able to use SQL injection attack. David and I happened to be in a restaurant in New Jersey when it wasn't like a year and a half ago or so. And this restaurant was offering Wi-Fi, and we thought it would be neat to be able to just work from what we were eating or what have you. And it's one of these restaurants where you have to pay for access or you had to get a password or something like that. I wasn't there. I don't know what your story you're telling here. He's denying all of this. So anyway, my friend decided that it would be a great idea that, oh, it looks like there's a username and password that I have to enter. And the people at the waitress didn't know. The servers didn't know, et cetera. So we literally just did, my friend literally just did a SQL injection attack to be able to gain access to the Wi-Fi connection in this restaurant. And it was, again, it's not malicious. And we told the manager that was there, oh, look, you're using the software and it has this vulnerability. That means that people could literally delete whatever information you have. And it's a risk. And it just goes to show that the manager just didn't take us seriously. He said, oh, OK, well, don't delete my data. It's like he left without really believing that this problem existed or this problem could even occur. And it really goes to show. I think that not only do we have to be cognizant of the security, but also realize that there's things that are sort of out of our control that we have to try to get some people to pay attention to at the same time. And so we can make this one more concrete. So as Dan said, it's beyond the scope of a course like this. We want programming in this language called SQL, SQL. But this is a language that people use when coding projects, programming projects that involve databases. A database, for today's purposes, is just like a really huge spreadsheet that you can store lots and lots of data in. And it's generally stored in rows and columns. But in a database, you can have multiple tables, multiple worksheets, all of which contain different types of information. So the type of words you use to request data of a SQL database, which is an incredibly common type of database, is something like this. Select user from a table as a spreadsheet called users where, and then you specify some condition, username equals something quote unquote. And that something is generally provided by people like you and me when we type our user names and passwords into a form on a web page. So what a lot of programmers do incorrectly and very dangerously is if they receive some input from a website, they just kind of foolishly, naively plug it right into the dot, dot, dot. They effectively do a find and replace. But this is dangerous because if you have some bad person who's just trying to mess around with your network or worse yet trying to do some damage, I could type in a username that's really not a username at all. But it's something like this. So suppose that for the dot, dot, dot, I, instead of typing in Maylin or John Smith or something like that, I instead put, you know what? I'm just going to put another quote there. And I'm going to put a semicolon. Semicolons are for computer programs. Often like a period is in English. It says stop here. Start something new. So maybe I'll say semicolon. And then maybe I'll say delete from user. And this is the syntax that would just delete everything from a database. And then I can do another semicolon. And then I can just finish the close quote elsewhere. So in short, I can just plug in any value I want and just assume that the programmer was so sloppy that he's not even warding off against dangerous words like delete and semicolons and quote marks. So this is what a SQL injection attack takes advantage of. And these are not complex attacks, usually. You literally just come up with a crazy looking line of text like I just did there and hope, in a bad way, that the server will accept it as input and do very bad things, whether just log you into the person's network, delete data from their database, or any number of things. So this is actually quite common. And when web-based applications these days get compromised, this is very often to date one of the reasons. Yeah. OK, yeah. Go ahead. So what exactly is the delete from user line in the second to third line in the user? What is it, especially like a kind of virus that somebody will put on something? So that's a good question. That's actually a good segue. It's not a virus. Virus is a piece of software, not just a line of text that a human types. We'll come back to those in just a moment. So what are the differences in worms are related to a different class of threat altogether? This would be generally a manually forged attack from a human who is trying to take advantage of some form on the web. Yeah. Yes, but humans are stupid. And that is quite a sincere comment. As obvious as this is, even honestly, an introductory course where it seems obvious that the defense mechanism should be just search the user's input for dangerous keywords like delete and dangerous characters like colons and apostrophes. What you've just proposed is actually literally the solution. There exist things called functions. And we'll talk about this when we get to our programming lecture. Function is like a miniature program that you can use inside of your own program when writing it that can take as input text from users and produce as output either a yes or no response. This is valid. No, this is not. Or it can even massage the user's input and remove dangerous characters like semicolons and quotes for you. So yes, in reality, in the programming world, avoiding this is actually very easy. But whether through lack of education, lack of care, lack of knowledge, it is not always done and not always done properly. Case in point, dense friend at the restaurant. So it really, a lot of this boils down to stupidity or clinging to technologies that were not designed with some of these attacks in mind. Yeah. Unless the main people doing the attacks are writing the soft. So yes, you could have some perverse incentives where it's actually all the bad guys in the world writing all this buggy software so they can later take advantage of it in their free time. I suspect there's enough good people in the world that that's not the case. But it's a clever logical play there. Other questions? Virus, worm? Good time for that segue. What's a virus? What's a worm? All right, so viruses and worms. So this is another security problem that's sort of different from what we've been talking about before. And the viruses and worms are more software that somehow gets run on your computer that can do bad things. And so it can be difficult to draw the line between normal software and a virus. So for example, let's say that we have a program that you download and you know the purpose of this program is to scan your computer for maybe really large files and it will tell you what the largest files are. Then it gives you the option to delete these large files. So you can maybe save some space, clean up some space on your hard drive. This is a very reasonable thing to be able to do on a computer, particularly if you have an older computer and you need to clear up some space on it. You'd want to find out what is taking up all of the space on this hard drive. But you can imagine that there might be some software that you could download and it claims to do something else. But instead what it does is it actually goes through, finds largest files and just deletes them without you knowing. And it's this sort of problem that this malicious software or malware would come into play where it can do these bad things to your computer without your consent or without your knowledge. And so there's a differentiation between so-called viruses and so-called worms. Worms are software that typically will run over the network, but what they will do is they will be able to propagate themselves. They will be able to travel from one computer to the next computer just all by itself. It's part of its programming to be able to find another computer, infect that computer with itself, and do something. Usually it will just continue going on and then maybe it will basically allow a back door for some very large group of bad guys to be able to take over a complete entire networks of computers and be able to do whatever they want with these computers. Viruses, on the other hand, require that the user actually do something. So for example, a virus might be saved or propagated in a Microsoft Word document. Just as an example, you might give this Word document to your friend via an email, maybe a USB thumb drive, or something similar. And only when that person opens up this infected document does that virus become installed on that person's computer and then do something malicious from there. Have you ever opened a Microsoft Word document or Excel document and says this document contains macros? Ever seen that word? So macro is a little computer program. So Microsoft, for better or for worse, implemented support for programming inside of Word documents and Excel documents and yet others. And this is advantageous because it means you can actually have many programs running inside your document that maybe make forms appear, little menus and such inside of Excel. It can do spell checking automatically inside of the Word document. In short, you can write programs inside of these documents. But unfortunately, as Dan notes, these same documents when opened can execute code, which is actually malicious in nature. And the problem fundamentally with computers today, and it's been this way now for 10, 20 years, almost with all of the major operating systems that are popular, is that almost all of us use our computers in what's effectively called administrator mode. Now, this is thankfully becoming less and less the case, but it's really not a solution. By administrator mode, I mean that when you bought your computer or installed Windows or macOS, you were probably given administrator rights. And what that means is that your account, whether it's Malin or Dan Allen, has the right, so to speak, to install software, to delete files, to move things around. So administrator access is good, especially when it's on your own computer, because the alternative would be if you're just a normal user on your own computer, you might download some new program, double-click it to install, and say, sorry, you don't have permission to install this. It kind of defeats the point of having the computer. But the problem with this fundamental design is that you're either administrator or you're not. But once you're an administrator, you can do whatever the heck you want on that computer, or in turn, any program you run by double-clicking can do anything it wants. Now, for the most part, all of us are probably very trustworthy. Back in the day when we'd buy software in a shrink-wrapped box, you'd assume that this piece of software came from a reputable company. Look, I just paid for it. Their name's on the box. I'm going to install it. So when this is tax-accounting software, that's all it's going to do. I'm going to trust that this is tax-accounting software. But that is purely a leap of faith. There is no technological reason that TurboTax are the like. After installing itself, suppose they had some obnoxious employee who decided to hell with anyone who tries to ever uninstall this software, that software, thinking off the top of my head here, could not only uninstall itself, it could go through your My Documents folder and delete whatever it wants. Because once you've installed something on your computer, you've essentially handed it to the keys to your computer. Now, this is in contrast with an alternative model that hopefully we as a society will get to at some point that, so to speak, sandboxes, individual applications. In other words, a much smarter model would be when you install a program, it goes into one folder. But when you double-click it and run it, it goes to a dedicated part of RAM that cannot access other program's memory. And when you double-click to install it, all of its files indeed go to one place and not all over your hard drive, which is often, too often the case on macOS, Windows, and Linux these days. So Microsoft has begun to chip away at this problem, as has Apple, by increasingly prompting you. When you download some program from the internet, very often will Windows or macOS say, oh, this came from the internet. Are you sure you want to open it? When you double-click some program to install it, very often you'll get this cute little sound in Windows these days and then a prompt saying, you must hit OK to continue. Now, unfortunately, this is really a punt by these companies. To put the burden on us, the users, who generally, myself included, have no clue what we're doing or care what we're doing. Because if we click that icon in the first place, odds are I want to install it. So why should I ever exercise the judgment to click Cancel if I don't even understand what it is I'm looking at? So this is really a cop-out fundamentally from these companies at this point, because which of us ever doesn't click OK when we've just done something and want to forge ahead? I would wager most of us just blindly or hurriedly click OK to proceed. Exactly. So that's exactly right. I would agree. So we've proactively downloaded or bought something. We double-click it because we want to install it. So obviously, we're going to want to click through, just intuitively, the menus allowing us to actually install this software. The problem, though, is shrink-wrapped software. You at least have a modicum of protection because the company's probably not going to survive if it tries to mess around with users' computers. Couple of exceptions there have been to that rule. But if any of you have ever googled or searched around on the internet for some piece of software like Solitaire Game or any kind of software that you hope you might find for free or for shareware, a few dollars, the funny thing is the worse the website looks, the more overwhelming it is with icons and bells and whistles that tends to be correlated with software that, yes, you might be able to download for free, but comes with an implicit price, junk and spyware, as it's called, or adware, as it's called, that does any number of things to your computer. And this really began to happen a few years ago where you would download, especially file-sharing programs, like Kaza was a popular one. And with Kaza, would you get all of this junk software along with it, some of which just did minor things like pop-up little ads when you visit websites. But in theory, that spyware, that adware, could have been, because technologically, completely possible, could have been logging every one of your keystrokes at the keyboard. And what that means is if they're logging your keystrokes and your computer has internet access, in theory, if you've been tricked into installing something you didn't intend, that same software could also do what with all your keystrokes? Send it off to some other random person in some other country, even, on the internet. And here's where things do get genuinely scary. You can have all the HTTPS you want for your websites. You can have all the technology protecting you and your online bank account. If someone has essentially broken into your home or you let them into your home digitally, such that they can watch everything you type, it doesn't matter if you have usernames and passwords. It doesn't matter if you have encryption, because they've gotten so close to the original source of the data that you really have been compromised. And so this is, in my personal experience, the reason I never do online banking on any computer other than my own. And even that's still a risk, because all the scary stuff I just said applies even to my own laptop and my own naivete when surfing the internet. But at least it's not some computer that's been sitting in a public lab or an internet cafe where God knows what's on that computer, because the same scares can apply. Yes, a question. Yeah, that's not... Repeat for the camera. Oh, so the question was there's a person who, in trying to be safe with online banking, is saving, sounds like saving a username and password in a Word document and then copying and pasting that from that document into the website before actually logging in. And I would say that this actually is probably almost a worst case scenario in terms of security, because this password is supposed to be a secure thing. It's supposed to be something that nobody is going to be able to figure out or be able to understand. And by saving this password in just an unencrypted document on your hard drive, anybody who would be able to gain access to that file will then be able to open it up and see, oh, I know now the password, the username and the password for this particular file. You might say, well, okay, you know, he physically locks the computer in a room, he never access it, he never lets anybody else use it, but still the risk could be there, because what if somehow some of this spyware or adware or malware gets installed on this machine, now all of a sudden there... It could be that that file is now exposed to the outside world. And so this is actually not a good thing at all. This is maybe... Well, it's hard to say whether it's better or worse than actually having your password on post-it notes along the side of your computer. And that also is very much discouraged as well, because it's the same sort of thing. Now whoever sits down at your computer will be able to see, oh, okay, the password for Facebook is 1234 and the password for Bank of America is 4321 or what have you. This is... It's not a good thing. There's software that can help you try to combat this problem, where on the Mac, for example, there's this software called OnePassword, where it encrypts all of this information for you into a secure file, and it will then only decrypt this information and show it to you once you've typed in what should be a very, very secure password. And so that is relatively safe, because then in that case, that password is not going to be accessed unless someone knows the password to this and is then able to decrypt this encrypted file, et cetera, et cetera. So that's probably a better way of doing this. There are software solutions that will help you do this, but again, you have to be able to trust this software. Like you can't just download any old software and be able to trust that these people are not sending the passwords that you type in over the internet to save them on their own servers unbeknownst to you. It again, has to be a game of trust. And in fact, this game of trust is extremely important when we're talking about all of these things, adware, viruses, malware. Even if you trust the person and you get an email from somebody saying, oh, I'm sending you this really awesome game with Solitaire, why don't you try it out, almost always should you not click on that piece of software, because generally what's happened is what? Right, well they could, right, so along those lines, they could have intentionally sent it to you, but what if this virus or this malware or this worm decided that it would send itself? So it took over, for example, their computers in some way and somehow was able to send itself to everybody in that person's address book. Now all of a sudden it looks like that this program or that this file came from this person when in fact it did not or it may not have. And so many times, whenever you receive an attachment, unless you're absolutely sure that you were expecting it and that you know it's contents, even if you know the person that it's from, I recommend you don't open it, don't open it unless you know, unless you somehow contact that person via phone call because that might be a little too hard for a bad person to be able to fake and find out whether or not you actually want to send this data or you want to open this data from that person. Yet no stupid internet forward is so important that you can't just delete it and be just okay in life. I thought we would offer at least one solution to some of these problems, at least with online banking. This is actually very consumer-oriented but also very neat technologically. So there is, in general, this technique in computer security called two-factor authentication, kind of a big way of just saying a simple idea. Two-factor authentication means to authenticate against some resource to log into a website. You not only have to provide one secret, you have to provide a second piece of information. So most of us log into websites daily and provide a username and password and that password is one protection. But the problem, of course, with the password is if someone watches over your shoulder, if someone logs your keystrokes, if someone gets the Word document on your hard drive, there's so many ways in which someone else can access that account if your password's compromised. So that's one factor, authentication. Two-factor authentication is a system whereby you don't authenticate yourself just by providing something you know in your mind, your password, but you also present, secondarily, something you have physically. The hypothesis here is that, at least for a large range of attacks, it's unlikely that the bad guy is likely to get inside your head or get that post-it note and also something physical from your person. So banks in particular are increasingly but slowly rolling out two-factor authentication. In fact, I have one on my own key chain here. This is a little electronic device. It's a really dumb, cheap device that has one button on it. When I push this button, there's a little LED screen here, like a little digital watch and there's a six-digit number on the screen. And this number is valid typically for about 60 seconds. Now, that's not my password because otherwise someone could just physically take this from me and log into my account. They still need what's in here. But now, when I log into my online bank accounts, I provide my username and password, hit Enter. Then I have to provide in a third field the number that's currently on my little key fob device. And that number changes every 60 seconds or so. It is synchronized when they mail it to you with some server's clock so that they're always in lockstep, at least with fairly high precision, so that now someone to compromise my account has to know my password and physically has to have access to this device or, much less likely, has to compromise the very fancy mathematics that are being used to generate these numbers pseudo-randomly. So this isn't just a number that goes from one to two to three. It's jumping around these six-digit values. Now, the downside, unfortunately, is right now the only big bank that I know of locally that offers this is Bank of America. So not a bad thing, but since they're so omnipresent, but you don't have a whole lot of options. In terms of online brokerage accounts, eTrade has offered this for a while and Charles Schwab online also offers this. Fidelity does not, ScotTrade does not, I don't believe. And I'm sure there are others out there, but when I shopped this around just a few months ago, those were the several that I found. And to be honest, this is an important enough issue, I think, technologically and financially, that I actually left my old bank and brokerage accounts simply because I wanted this feature. And it was actually unfortunate. I called Fidelity at the time, asked them, do you have, and I used as many thesaurus words as I can to make sure the person knew what I was talking about. I said, do you have the little digital key fobs? Do you have two-factor authentication? RSA key fob is another buzzword because RSA can make these things, which is a company. And still, they didn't know what I was talking about. And this was unfortunate because all these attacks we've been describing thus far really are not hard for some person on the internet with some Google skills to learn how to do and actually take advantage of. But I think it's important enough to consider, if you don't mind disrupting your lives, switching over to one of these vendors. Now, the upside, too, is that Bank of America does not mail you a key fob. They actually took a really smart approach, leveraging existing technology. When you log into a Bank of America account, if you've signed up for the service, and this is something, if you have a Bank of America account, you can just do tonight, even if you want. You don't need to sign up for it. You have to click some links, but you don't have to pay anything, you don't need anything physical. You tell them your cell phone number and you confirm that it's actually your number because they call you or send you a text message. And now, when you log into BankOfAmerica.com, you give your username and your password, and then they send you a text message, which usually arrives within five seconds, maybe 10 seconds, and what's in that text message, do you think? The key, the six-digit number or eight-digit number, whatever it is they happen to be using. And then I have about 60 seconds to read that number off my phone, type it into the website and hit Enter, and voila, they let me pass. This, frankly, is brilliant. It's the fact that they don't need an infrastructure of key fobs and the like. They can use existing technology, and this is, it's not perfect, but it's so much more secure than just using a username and password alone. Well, this is more secure even than what a lot, than frankly what we might call security theater in a lot of these banking websites. So for example, even BankOfAmerica has this option where you type in a username and password, you hit Return, and then you see what? You see like a little picture that you've chosen, and you see a little message that you've supposedly chosen, and they say, okay, now you know that it really is BankOfAmerica that you're connecting to, that it's secure, because you're connecting to BankOfAmerica, and this is solving a different problem, right? Because this, now what BankOfAmerica is trying to show you is that by typing in your username and password into this, and then you hit Enter, then they give you some additional information from their servers, then you're saying, okay, then what they're trying to do is they're trying to reassure you that you're really connected to bankofamerica.com not to some other website that's trying to look like bankofamerica.com, but the downside to this is that it's just all on the internet. What is stopping some other website from contacting BankOfAmerica and finding out this image and this same message and just re-displaying it for you? It really isn't solving much of anything that they have this additional security step, at least arguably, and so there do exist along these same lines what we might consider to be something that looks very much like the URL that we want to do. So for example, there's a bank that's called Bank of the West and you would go to their website by going to bankofthewest.com and be able to log in and do your online banking that way. But there was a person who decided to be very, very clever and they realized that they could actually get around this in a very simple way and this guy, I think, just via email sent out a mass email to a lot of people. People may or may not have been Bank of the West customers but it doesn't really matter showing them this link, asking them to log in and to type in some important information because their account has been compromised or because they need an update on information or something like that. But what's the problem with this URL? It's not just my bad handwriting. It's two Vs instead of a W. So there's actually a space there and so even though this URL looks very, very similar and in fact, you may say, oh, okay, well when I'm sitting at my computer I would notice two Vs versus not but depending on the font, you really may not especially if the W looks very, very similar to two Vs right next to each other and this in fact is what happens. So all of these people went to bankofthewest.com and it was a webpage that looked almost identical or identical to Bank of the West's website and they were convinced that they would log in and send all of the appropriate data over to this so-called bank but now what's happened? This bad guy actually has all of the information that these people have entered in including username, passwords and whatever other information they've requested in this particular session. So this one's pretty brilliant and to be honest, I wouldn't be surprised if I had fallen for something like that because it is, if you look in your browser tonight looking at two Vs next to each other unless you're so anal as to spell check every word you look at and even hit the space bar to see if there's something in the middle there. Odds are a typical human's not gonna notice an attack like this but hopefully there are in fact some defenses in there are. So when we've, we said this in our internet lectures if you ever receive an email that contains hyperlinks you can at least hover over it usually and at least see what the URL really is but even then best practice don't even click on it. If it's an email from PayPal or Bank of America, Bank of the West places you actually have accounts don't even follow their links and frankly the smartest banks don't include links because they've been the victim of this attack in the past. Just go yourself to bankofthewest.com or bankofamerica.com and sometimes to be honest even I'm not quite sure like Charles Schwab like how do I spell that? What is their domain name? Sometimes it's not obvious and unless you bookmarked it you might have to guess. Well guessing is also probably bad practice. So one of the best uses honestly for very popular search engines like Google is that you can leverage the collective wisdom of millions of people and thousands of servers because usually if I type in a popular term like Charles Schwab and hit enter if Google's algorithms are working well odds are the website I'm looking for will appear at or toward the top of the list and this can actually be a useful heuristic. It's not a perfect test but a useful heuristic as to what the actual address is for website lest you yourself get duped. It's probably safer in general to now go ahead and follow this link because the odds of enough adversaries colluding on the internet to trick Google into thinking that a website for a major site is not the right you address is very, very unlikely. So this is two is a good sanity check. Five minute break? Yeah, let's take a five minute break. When we come back we'll keep talking about security. Hi everyone, welcome back. So before the, what? It makes such a big deal about coming back. Well it's a big thing. We're back, it's an exciting thing. Who wouldn't be excited that we're back on there? So anyway, before the break we and early on in the lecture we were talking about ways of being able to actually sniff track it, sniff traffic. That's a, I made a weird error there where I combine traffic and packets and specifically what we're trying to do is try to sniff some packets that people are sending via Wi-Fi over the air from their computer to the router. And so there's a variety of software that you can use to do this and this software is not exactly that. What this software actually allows us to do is to view a lot of information about specific routers. And so we, in the short time that we had to try to prepare this we didn't actually get any packet sniffing software but what this allows us to do is actually to see what routers are open or what routers exist, what routers exist and specifically Wi-Fi routers and we can find out quite a bit of information about them. So as you can see there's a variety of routers here including staff, Harvard University, so on and so forth and you can tell via this column over here labeled ENC what sort of encryption this particular router has and so you can tell that staff is encrypted, setup is not encrypted and Harvard is also not encrypted. So this might be at odds though with what you've noticed. So when you log in or when you connect to a Harvard SSID or a Harvard Wi-Fi what usually happens or the first time you connect what typically happens when you try to log in, right? So there's an agreement thing and you have to enter your Harvard username and password variety of steps but this doesn't mean necessarily that that connection is encrypted and so it's important to realize the difference between having an encrypted connection and just being allowed to access an unencrypted session and so what Harvard does is theirs is unencrypted and so that means that the data that you are sending back and forth from your laptops from your phones is actually unencrypted even though your specific laptop is allowed to be on this network and so what's particularly neat is if I double click on this we get even more information about each of these routers and specifically we can find out who is actually connected and how much data is being sent. So you can see that there's this one machine here probably somebody in this room is actually sending quite a bit of information even as we speak so maybe there's some file that's being uploaded or something going on maybe they're live streaming the awesomeness that is this lecture and we can see although with this particular software we can't actually see what the packets actually are we can see this hits at the possibility that we can still see the data that's being sent over the network and so all of this data is very, very real we can see that there's Apple computers and the vendor doesn't necessarily mean the brand of the computer just means who built that particular wifi card and so we have a variety of PCs and other machines that are actually connected with the specific IP address and just a whole bunch of information that is actually pretty interesting and also kind of scary in a way because now all of this information where previously you thought it was very private to your computer and to your connection with the internet is in fact public because it's just being sent over the air is in a very, very easy way or in a very easy way or in a way that's very easy for us to be able to read using some software like this or via some other software where we could actually do some packet sniffing So and these things on the left hand side just a quick review, they're called what? These numbers on the left hand side? Mac. Mac addresses, media access control, not Macintosh and so those are the unique sort of serial numbers that identify network cards and computers so think back a few weeks to our hardware and internet discussions but what's interesting to note in some home routers, most people choose not to use this option but you may see it in your configuration menu if you go there sometime a lot of home routers let you restrict access not just by using encryption, web or WPA you can also use Mac addresses where you actually type in the address of a particular computer and say only let this guy on the internet so that's good because it means your random friends can't come over without you registering proactively their computers but as Dan's demo suggests all they need to do is pull up a little free piece of software like this figure out what the Mac addresses of some other computer that's already authorized on your home network wait for that person to go to sleep or to log off and then just change their computer's Mac address to be that one so this is not a number that's immutable with the right software you can change what the Mac addresses on a lot of operating systems and is broadcasting to the world and same thing on Harvard's campus when you first registered your Harvard laptop or desktop computer went through all those annoying steps provided to your Harvard ID number and PIN well that too, you're just registering with Harvard's network, your Mac address if I figure out what the Mac addresses of some Harvard affiliate and I'm just some random person in Harvard square I can get on Harvard's network very easily too these are not hard problems for the determined adversary and back first I think that's probably unknown manufacturer it's so Mac addresses generally have fixed prefixes so patterns of numbers that they begin with that belong to certain manufacturers the software probably does not know what the ethernet card beginning with 014096 means so this for those of you who might be in a company environment or a university environment a lot of these entities prohibit you from using your own wireless access points in your office in your dorm room because they don't want rogue devices on the network they usually keep you off the network by knowing oh you know what all links this router start with one two three four five six so let's disallow all Mac addresses starting with that value but the irony is most of the same devices by links this and others you can click a button change its Mac address to be anything you want and pretend to be a completely different technology altogether so that too is circumventable so there's and so per my comments earlier there's a lot of security mechanisms in place many of them raise the bar but certainly don't keep out knowledgeable people and some of them are just foolishly implemented so actually last last little side so years ago when we had analog cell phones if some of you had those by analog I mean if you had poor reception odds are you heard static so very early on in the day of cell phones most hardware that was in use would send a unique number to the local cell phone tower saying I am Dan Armandaris I am Dan Armandaris and would just send a unique number identifying that phone so that they know who to do what to so bill right for that call well it didn't take long for some smart bad guys out there to realize that oh why don't I just take a phone broadcast someone else's number and make free calls and that's in fact what happened in great quantity early on and this too sort of security through obscurity as is the cliche is generally not good practice because all it takes is for one person to figure it out post an email on the internet and now everyone knows how to compromise your system for more such curious aside and how flawed the crimson cash system has been for years on this campus talk to me after so this this same idea of having a mac address and trying to rely on it to determine which computer was used by a number of cable companies or ISPs early on so many of you might recall from few years ago where you can really only have one computer connected to a like a cable modem or to a DSL modem at any one time and if you tried to change that computer then it would suddenly it would suddenly disconnect you it would suddenly not work the work around for this was basically just as soon as you got a new router just use the same mac address of your old computer or of your old router and then it will appear as though this new device that is now allowing you to share this network with or to share this internet connection with a variety of computers within your household is now still allowed to be able to connect to the internet and access whatever resources are available and so it's again it's one of these things where it raises the bar but if you just understand what's going on behind the scenes most of the time relatively easy to to circumvent and to get around now to go into a separate direction when we're talking about accessing websites and the privacy concerns that can arise out of that like what kind of information does a website know about me so for a long time cookies were sort of a big concern because there's this piece of information that's being stored on your computer it's being sent back to these servers and so then these servers might be able to track you in some way and frankly cookies got a bad rap especially at first and well they do have they could have malicious purposes that's not always the case and what we should or maybe might be more scared of is what can happen even without our knowledge so you might recall that when we actually connect to a website and we try to to and we try to get the web page from a particular website that we have to make a connection to it and send some information about our computer so if you might recall let's see using live HTTP headers let's say I'm just refresh this page alright I'll just go to Google oh I have to connect to a network first because of this this is not going to work out okay so you might recall from live HTTP headers that what we are actually able to do is to send some or what actually happens is that our computer sends some information about us so okay going scrolling back up here so you might recall that I'm trying to contact Google.com so what happens is my computer sends their servers a bit of information or a variety of information so you might recall this from our internet lectures of a couple weeks ago for example it tells it what browser I'm using it tells it what kind of computer I'm using whether it's a Windows machine or a Macintosh and it tells it what version of Mac OS I'm using what language I'm using so on and so forth but this really is only a small percentage of the data that your computer could actually send so this is always sent to servers no matter what and it's really up to the servers to decide and the people more specifically they're administering those servers to decide if they should save this information that you're sending to them maybe they want to try to track what type of users are using their website maybe they want to see the breakdown of Mac users versus PC users or how many people are using the latest and greatest version of whatever operating system and that's relatively innocuous that's generally okay for a website to want to know or for them to be able to target their website and their web pages for a user base but behind the scenes there's also additional information that can be sent so when we actually look at the source of a web page you might recall that this language is HTML and specifically here and there's some additional stuff going on as well another programming language and HTML isn't a programming language per se but there is a programming language that can be sent with an HTML page that's called JavaScript and JavaScript is a programming language that's run via some code on your web browser to accomplish some task and usually this will do some relatively neat things so with Google for example what they will use JavaScript for is to be able to do this autocomplete for example so when I start typing in computer science what it's telling me is a list of possible options or some of the most popular searches that are associated with computer science and this is done not through HTML but through JavaScript but JavaScript being a programming language can also do a lot of extra things and so it can actually find out additional information about your computer that it can then send behind the scenes to the server so it then might be able to find some things like what size screen you have and specifically what resolution you're running with your screen it might be able to determine what plugins you have installed with the browser to determine maybe if you have Flash installed and maybe you have RealPlayer installed because you're running an ancient computer or a variety of other things and this is where now a lot of the data that we may not want somebody to see is being able to easily be sent over the Internet using just some relatively simple JavaScript. Questions about these Internet-based attacks? No, because they're about to steer in the direction of our hard drives. All right, good job. So one of the neatest jobs I ever had, so there's no good segue here but it's all related to security. So one of the neatest jobs I ever had as a grad student was interning for the Middlesex County district attorney's office and I was working for their forensics team and their special investigations unit and what this meant was that every week the local Massachusetts State Police would bring by hard computer hardware, hard drives, tower computers, laptops, CDs, floppy disks that had been seized during search warrants and the context was that we now needed to provide a summary of what evidence was or was not on this computer for the district attorney's. So it was one of the neatest things I've ever done and it was very, so it's definitely boring sometimes. It's not always what it's like on Law and Order and CSI and TV but frankly a lot of the time it was which was really quite cool. So what's the relevance though to us normal people who just have and use computers as to what data people like us, now me being in the other group could actually find out about you and what can you do when you have sensitive information whether it's personal or financial and you do actually want to eradicate this information so that when you sell your computer or scrap it it's not lingering out there on the internet. In fact we'll link ultimately to an article in a week or so that a colleague of mine at MIT years ago did as part of his own doctoral research, long story short, he with some research funds went out on eBay and the like and bought hundreds of hard drives that people were selling, 10 gigabyte drive here, 50 gigabyte, 80, for the times fairly large hard drives that people were selling and very often do people claim that these had been formatted so to speak. To format a hard drive means in layman's terms to erase it but in technological computer science E1 terms tonight actually doesn't mean much at all. You're not actually erasing the data typically and he found troves worth of information whether it was credit card information, health records, pornography, social security numbers. I mean all sorts of stuff that maybe was left there because people didn't really care or they didn't really think someone like him was gonna buy it for $5 off of eBay or because people didn't know what they were doing when they tried to erase this data before selling or donating this hardware. So what actually happens when you store a file on your computer? Well you may remember this discussion from a few weeks ago. If this is that thing called a platter inside of your hard drive you have little bits all around here representing zeros and ones and bunches of those bits say the ones up here happened to represent a file. Now the problem is you may recall is that when you delete a file it's generally at least a two step process. The first thing on Windows or Mac OS that you do you probably drag it to the recycle bin or the trash can and little sanity check have you erased the file at that point. So most of us know that these days that just dragging something to the trash or recycle bin really doesn't do anything. Eventually it's deleted when you start to get low on space or if you obviously choose empty trash or empty recycle bin. But, and that's where the story starts to get juicy and worrisome does it really get erased then? So not really. Recall that most operating systems remember where your files are by essentially having a spreadsheet inside of their memory where the first column here is the let's call it the name of the file and then the second column is location on disk and so if the file is called doc just to keep it a short and fit in my little table here it might be at location 3, 4, 5 and that just happens to refer to this location here. What do I mean by 3, 4, 5? Well if we think of bunches of bits on the disk representing bytes so chunks of 8 bits well that's the 345th byte on the hard drive. Wherever that is that's where that file happens to live. And so for those who may not recall how do we actually, what happens when you erase a file typically? Yeah? Exactly. So the computer typically forgets about the file by erasing the record in this little table but it's really too much effort to go and erase those bits and you wouldn't, you don't really erase bits. If you started erasing or removing bits you'd be losing hard disk space and perpetuity but you can imagine turning them all to zeros or all to ones or just scrambling them making them random but most operating systems don't do that by default because it's just who cares or it's a waste of time. And it's really the waste of time that motivated that design decision years ago when disks were even slower than they are now. Having to wait for the computer to erase a file probably was sufficiently annoying that it was a reasonable trade off to just do this trick here but now thankfully some operating systems Mac OS in particular makes it much easier to do this. Those of you with Macs, if you go to your finder you may notice or your desktop you may notice that under the finder menu there's both an empty trash option equivalent to empty recycle bin on windows but secure empty trash. This will do what I hinted at earlier which is change all these things to zeros or all ones or scramble them up. It depends on your settings but that is secure in the sense that it erases not only this but also this question. So it's a good question. I'm gonna have to Google to confirm to be honest but I think that just does step one and two together removing it from this table. I don't believe Vista has a secure erase so to speak built in natively. Could be wrong but Microsoft has been guilty for years maybe still in Windows 7 of not having native easy to use support for this but don't quote me on that just yet. I'll check and remind by next week. Okay so your hard drive is among those C's by the Massachusetts State Police. What are the implications? Well if you have some say financial documents that you've been forging and you've been doing a little white collar crime and that's why your computer got hauled in well even if you've been erasing those in this typical manner by emptying the recycle bin or trash can well people like us with the right software or know how can actually scour the hard drive ignore this table and look for all Excel files for instance or all word documents because and there's a number of ways of doing this sometimes some computers happen to maintain backups of this for instance products like Norton utilities and the like can sometimes and they're advertised as undeleting files how do they do that? They generally keep backups of this table and then just hope that the actual bits don't get reused for other files or it turns out that a lot of file formats out there dot word documents, Excel files, JPEGs and the like have what are called signatures which means even though we keep drawing them as random patterns of bits in reality in a lot of file formats the first few zeros and ones are a fixed known public pattern and the last few bits, the footer so to speak are also a fixed known pattern and that helps the program like Microsoft Word or Excel confirm or deny you know what this is in fact a legit word document because look it starts and ends with this known pattern of bits so forensic investigators can use those same tricks to look on the hard drive for data that with high probability was in fact part of an Excel file and you can certainly do even lower level things like just search the hard drive for all numbers that happen to be like five digits followed by a hyphen and then four digits in other words known credit card patterns or the like same deal with social security numbers and this is exactly the kind of stuff that my friend worked on and we'll post a link probably in the in an upcoming assignment it's really just a very accessible fun article but it's also quite scary because with this fellow found two and this is another one of these caveat mentors there's a lot of commercial software out there window washer is one product name that comes to mind don't write it down it's not very good and that was part of the sense of this article it was amazing that of all these commercial products they cost $10, $50, $100 my buddy at least after vetting half a dozen or more of them all of them were flawed in some way and did not in fact work at advertised they weren't malicious they didn't do things they weren't advertised to do the problem was they didn't do what they were advertised to do so in a nutshell they might have claimed that we will scrub all of these bits but in every instance of my buddy's tests we found that not all of those bits in fact were erased properly so what then is your best defense if even if you're just a normal person you're not doing any bad things with your computer but you just don't like the idea of when you throw out your computer or donate it that someone with too much free time like my buddy is gonna go scouring around on their hard drive there's just no reason to disclose this data well you can certainly buy software that erases your documents securely you can use Mac OS's built in feature but the reality is when you get rid of a computer especially a desktop that you can open up and remove parts from pretty easily the reality is the most surefire mechanism is to destroy the device somehow and there exist corporate companies that will shred your hard drive usually by drilling a big drill bit through it which just renders it way too damaged for anyone but the NSA and probably not even them to care about I've seen pictures of people melting down their hard drives that's probably a little too crazy so I wouldn't necessarily go that route or the most normal person route is probably to download software and it can be free that doesn't try to selectively delete files and bits because that's where things get potentially buggy it does the entire hard drive and it might take a long time but a popular program for this it does require some comfort or savvy but I'll bring it up at least Derek's Boot and Nuke don't be misled by the cheesy title this is a website that you can go to D-Band for short you can download what's called an ISO image which is an image that you can burn to a CD you can then boot your computer from that CD and if it understands your computer you can then say you race this hard drive entirely and you can either say make everything zeros which might take a few hours you can say do it seven times pseudo-randomly using a Department of Defense standard that will take even longer and you can do even crazy things like 35 passes but the literature at least in computer sciences never once to my knowledge demonstrated that seven passes or God forbid 35 passes is any better than one pass over modern hard drives so the idea just for the technically curious is that some people have long hypothesized that just overwriting bits is all zeros still leaves those bits that we talk about having a north-south orientation not quite upright so for instance it was a one and you make it a zero you'd like to think that it goes like this but you know it kind of goes like this the magnetic particles don't quite realign perfectly so the hypothesis has long been academically that crazy people like the NSA they can detect this bit versus this bit and then realize that used to be a one and they can un-erase your erased data that publicly has never been shown to be the case but if you've got a few hours to kill do at least seven passes, yeah so that's a really good question short answer is it probably won't work so the hypothesis was if you've got a word document and a love letter you wrote this was not your question a love letter you wrote you really regret having saved you want to erase all tracks so you highlight all, hit delete and resave the problem is a lot of programs and operating systems they may, they will delete those contents for you but the file itself might be stored in a slightly different place the next time around because files tend to grow and so sometimes the OS might actually move it around on the hard drive so you might actually be leaving the equivalent of a paper trail on the hard drive of bits like this that the OS doesn't remember that they're there and they will get overwritten eventually but that would not be a sure fire approach the only way to do that would be really to use special software usually commercial software but even then it's flawed so if you're really worried about your data you should either wipe the whole hard drive ultimately or better yet a lot of newer computers Macs especially have built in features so Macs have something called File Vault which you can access through the control from system preferences this is something that costs you you know a few milliseconds of computation time your computer slows down marginally but it encrypts your entire hard drive so when you log in with your username and password you're not only authenticating you're telling the computer what secret key your password to use to decrypt the hard drive now the upside of this frankly even if you don't have sensitive information but you just don't want it to be easy for someone at the airport or hotel to steal your computer and have your data this way they can steal your hard drive and your physical computer but they also now need to know your password if they want to get at your data so unfortunately this is not meant to suggest bias this is a lot easier on Macs because it just comes built in I'm sure options exist for PCs but native support is good for these things other questions so the hypothesis couldn't you save time half as much time by just overwriting every other bit with zeros I would push back and say someone like me or an adversary could then statistically flip half of reverse half of those bits and we would then have 75% of your data back and that would probably be sufficient because we would guess just probabilistically to flip half of those bits correctly so we would then have 75% of your data and if all we need is some forged Excel files or a paper trail where you confess to something in an email odds are we have enough data at that point so the files might not so that's true oh randomly flip half not just every other bit so random is different but I would argue if you're going to these lengths to scramble your hard drive contents and wipe it yes it takes half as long but fine spend twice as long and be 100% sure it's not a corner that I would suggest cutting if the goal is to wipe the hard drive and not just make it difficult for someone to get at the data so yes it's reasonable but I'm not sure wiping is usually equated with 100% erasure and what you're proposing is and we're gonna save a few hours but there's a chance some of my data might leak if that's acceptable it's accurate what you said policy decision not technology yeah so you said that in principle if Apple computer is to be trusted to have implemented that feature correctly yes they should completely be erasing the files that once composed that file the problem with something though like a proprietary operating system is we humans can't easily audit that by looking at their code so we can only run tests so before I would say put a stamp of approval on that I would have to check the literature or run tests myself to see if that is in fact the case by doing tests write files, rename them do what you suggested delete part of the file and then check the hard drive after if secure erase probably work and frankly there's enough crazy paranoid people out there on the internet that there probably are people who do these tests so there probably are some reports you could at least put some faith in out there probably what just like it's probably fine it depends again this is a policy decision for your own personal life if you are comfortable with the probability low that it may be that someone like my MIT buddy will at some point buy your computer for $50 and look at your data then that's acceptable but it really depends on your comfort a particularly fun way of destroying the data on your computer if you don't want to download some software if you have let's say you have a really old computer ancient hard drive and you might want to sell the computer but you're cognizant now of all of these problems all of these issues that can arise with having the data just existing on your hard drive so you want to get rid of it you could do this software route but what I would argue is perhaps a much more satisfying a much more fun way of doing this is pulling the hard drive out of your computer and usually you might need some special tools but you can usually open it up but take a hammer to the disks and you can literally just smash it to bits and I did that once and it's wonderful it's like one of the best things you can ever do because like yeah taking out all your frustrations on that computer that you had for that computer at any one point in time and just destroying it to satisfying little pieces and it's I did feel kind of bad because if you might recall the platters are actually almost mirror-like so if you are at all superstitious then you may not want to do that because you might say oh God it had three platters so now I have 21 years of bad luck or what have you but I think I'm doing okay without it so it still is a fun way of destroying the hard drive akin to throwing it in a fire and melting it even though that may not be the best the best way of doing it is there a question? Yes I don't think it's hazardous waste there's always better ways to dispose of electronics because of the chemicals within them usually there's recycling programs for electronics etc but if you want to ensure that this isn't a problem you can take the little broken pieces and throw those away or at least some of them and then just recycle the remaining shell of the hard drive and then be able to dispose of it that way because the little pieces are really just bits of metal with some magnetic particles on them it's really in terms of electronics I think it's really the boards like the green and blue logic boards and PCBs that contain most of the bad chemicals in a particular computer I used the buzzword format earlier so this idea of erasing a hard drive so Microsoft was somewhat guilty of this years ago when most of us probably wouldn't have done this but you can run a command to erase or format a floppy disk or hard drive and unfortunately the prompt that the users would then see for years would say warning you are about to erase the contents of this disk all contents will be erased so they literally said part of this in all capital letters so that's a pretty clear message from Microsoft Corporation this is going to erase files from your disk it's not true so usually when you format a hard drive or floppy disk back in the day you're not actually overwriting the whole thing with zeros or ones or any pattern thereof frankly that would just take ridiculously long but rather you're doing one of two things or two things total one, you're writing a few zeros and ones generally at the very start of the disk that essentially instruct the computer how to access the contents elsewhere on that disk it's something called a partition table two, what these programs would often do is not write data to the disk but they would check the integrity of the disk often by reading from every location on the disk or sometimes writing to that location and reading back but not with the intent of erasing data but even that was a step that wouldn't necessarily erase your data but that's why it could take a long time and there too very reasonably humans would infer oh, this is taking forever it must really be erasing my data really was not the case it was checking the integrity of the disk and checking for quote unquote bad blocks so unfortunately when you erase a disk anytime something happens particularly fast odds are it's not actually doing something all that secure but that's not necessarily the case so and this is where I think you should take all even vendor claims with a grain of salt Apple has never been very good at security particularly in the day of cell phones so when the iPhone first came out they did have various cryptographic so high tech encryption mechanisms built into it that were cracked relatively quickly by the community and by cracked I mean compromise they figured out how to circumvent it or the like there's a neat feature though that's been built into the iPhone for a while now that's gotten better in fact over time whereby you can securely erase it so this was particularly of importance to me because I have my email on here all my friends contacts and there's nothing sketchy but there's just personal stuff that I don't want to just hand to some Apple employee and say here's my 500 contacts I don't mind if you read through all of my mail there's no reason technically for them to have that access I'm just not comfortable with it but at one point recently I smashed my iPhone unintentionally by dropping it on the ground and the warranty period was such that I could exchange it and they would give me a new phone for a cost but not the original price of the phone but I wasn't really comfortable with this transaction but thankfully though the screen was completely smashed and they couldn't just replace the screen because the button was already was also broken I was able nonetheless to access the menus to do a secure erase so there's a feature built into this phone the BlackBerry has had this for even longer whereby you can securely erase the phone and much like this idea here even though this is flash memory it will write all zeros or all ones or some random patterns to the entirety of the iPhone so a couple years ago this would take a while and that was a good thing frankly there's some comfort to be derived from knowing this phone is really working hard to like erase all your data but it's also a little annoying now what Apple does is much like File Vault on Macs is they encrypt the entire contents of your flash memory now this does mean that you pay a slight performance penalty but they do it for us so none of us really can compare apples and apples anymore so when you actually do a secure erase of the iPhone now it's almost instantaneous can you hypothesize how they can securely erase your hard drive in this or your solid state drive in this but nearly instantly if the drive's encrypted yeah exactly they just delete the key that allows you to decrypt it in the first place they delete the password you didn't even know you have now what does that mean well one of the properties of cryptography generally and what makes a good encryption scheme is if you take some text that you care about and encrypt it the output should look to a human eye and even to statistical tests random or nearly random so this means when your phone is powered off the data there is encrypted but it also looks random but because the key is stored somewhere and that is what's used to decrypt the phone well that means that it can be if it's lost your data is essentially in perpetuity not only encrypted but also just scrambled and random looking so that too is a neat trick unfortunately there is a cost if you don't have your own password it means well if someone just gets your phone doesn't matter if it's encrypted if you don't have a password or anything like that on it well they can still just log in and look at all of your data there but it's that's a different attack mechanism than smashing my phone and turning it over to Apple and just not wanting them to have access to all my personal information is there a question over there I was just going to oh using a magnet to destroy the hard drive it's maybe but only if the magnet is very very powerful you're not just going to be able to use like a kitchen or a refrigerator magnet and be able to erase the contents of your data though it's also not something where you would want to grab your kitchen magnets and apply them to the side of your computer but it's still it's not going to be a reliable way of deleting the data you might get one or two bits but most likely not really the only way to make sure you are getting rid of the data in totality is to destroy the disk in its entirety or maybe one notch down in terms of in terms of security would be to use one of these pieces of software that David was mentioning before where it actually overwrites everything with a bunch of zeros but there is one more consumer oriented implication of this those of you who buy desktop PCs especially or even laptops if they come in like a years warranty two three years and something goes wrong with your hard drive or SSD specifically you know with some probability the manufacturer will probably replace it for you now you may lose your data but that might be acceptable to you if you didn't have much of importance on there you just care about getting it back up and running but the catch is that most of those vendors to do warranty replacement expect what from you so your original drive and here's where it gets a little uncomfortable enough so that if you're like me you might rather eat the cost of the hard drive keep it and destroy it just because there's no reason for anyone else to have this data because even though it's damaged there exists data recovery places in fact I used to do this sort of on a consulting basis and we certainly did this at the DA's office so there exists people and software and tools physically with which you can repair often hard drives and this is precisely how the DA's office or one of the biggest companies and most popular out there is called drivesavers.com can actually recover your data so even though your hard drive might be broken all of the data might actually be intact so taking more proactive measures like using file vault or the equivalent to perpetually keep your data encrypted can also hedge against hardware failures where it's gonna be too late to go back and erase data or encrypt it because you've already lost control of the physical hardware and as an aside the gotcha with these data recovery places lest you think oh if something goes wrong I'll just have someone recover it especially the big reputable fish like drivesavers I mean they will charge you $1,000, $1,500 sometimes for the most challenging of cases this is not a cost effective way you go this route when you've much like what was that movie with the Harvard movie with Brendan Frazier he loses his thesis granted it was paper in this particular movie whose name I forget but when you lose your thesis it might be worth $1,000 odds are most of us do not have a word document or spreadsheet that's as important as $1,000 so keep that in mind too. So all of this information that we've been telling you about applies not only to hard drives but also to SSDs even though they don't have a physical spinning platter inside it still sort of operates with the same table idea and being able to delete files and if you do enough reading on SSDs you might hear some contradicting information whereby some of the sometimes when a file is deleted it's actually erased off of the SSD but I wouldn't take that as a guarantee to mean that this data is now actually secure you should assume that what we've told you is certainly the case if you care about your data and you could care about its security then you should be sure to actually do a security race to make sure that that data is overwritten with all zeros and in fact all of these concerns are still are mirrored still if anybody has physical access to your computer it becomes a problem that you have to deal with where anyone even if you have a really secure password for your account for example anyone who is able to access your computer might be able to say take apart your computer pull out the hard drive connect that hard drive to another computer and without even needing to type in your account's password would they be able to find all of the files in your My Documents folder all of the files in whatever folder that you don't want them to see they will now be able to access without having to even turn on your computer and so there's the same idea where okay especially a few years ago it was very very popular for people to put something called a BIOS password on their computer where it was not related to the operating system at all but when you first turned on your computer the very first thing it did after the power on self test after the post was give you a little password prompt and you had to type in a password and people said okay because the operating system doesn't even boot up this is now a more secure way of doing it well it still is not all that secure if someone has physical access to the machine and can actually take it apart you should assume that that data can be compromised that it's not actually secure by any number of passwords that you apply to your machine now there is an exception to this and it's this technology that David mentioned before like the file vault on Mac and on Windows this software is available for both Windows and Mac there's some software called TrueCrypt I think that's sort of the as far as I know the big name standard in terms of this idea but what this allows you to do is to basically cordon off a section of your hard drive and encrypt all of the files within it and more specifically it looks like a new hard drive in your computer but what it really is it's just sort of like a virtual hard drive it's just your writing files to this section on your hard drive that is being encrypted and saved on your actual physical drive and this is really the only way to ensure that physical access to a machine is still going to be protected from prying eyes even if they take out the hard drive if the contents of that hard drive is encrypted then you can be reasonably sure unless you have a stupidly silly password like one, two, three, four, five you can be reasonably sure that that data will be relatively safe and like David mentioned before good encryption will appear to be random so that if someone actually scans the entire hard drive they may not be able to tell oh there's an encrypted section from this sector to this sector from this block to this block or what have you it will actually look like it's just random data and so this is an additional layer of protection if you have really and I'm talking about really sensitive data that you need to make sure that nobody ever sees that you can be sure that this data will not be visible or will be very difficult for someone to actually crack So in short there's a whole lot of bad stuff that can happen some of the concerns you can't really mitigate other than by not using the technology case in point, Starbucks and hotels and airports if you're not actually encrypted with the local access point in theory almost anyone around you can see what you're doing the upside especially if you're Harvard staff and certainly for as long as you're a student you can download as Dan's discussed in lectures past VPN software so you can actually establish an in a secure connection across an insecure medium thanks to cryptography and that's actually one of the prevailing themes especially with the topics tonight so we're kind of in a bad place right now it's 2010 the world really hasn't caught up to all the various threats but crypto encryption will really be the solution to a lot of these problems once it becomes natively embedded in a lot of hardware once our CPU speeds are fast enough where it's absolutely a no brainer to encrypt everything even at a cost of performance but to be honest one of the goals of tonight's lecture and also next week is really just to empower everyone here just with some savvy and know how so that even though it's just gonna be a scary place when we walk out this door tonight at least you understand hopefully more of the threats how you can mitigate them so you can make your own intelligent decision as to whether or not this computer is safe for online banking or this one is not so hopefully we've scared you enough tonight and there'll be more of this next week thank you all for coming we'll see you next week