 Good afternoon, everyone. My name is Ming Zhao, and I'm a lecturer at Tufts University in Medford, Massachusetts, and I am here to talk about HTML5, breaking, abusing, and hacking HTML5. Just for my curiosity here, raise your hand if you have worked with HTML5 in some capacity already. Okay, so it's a good number of hands, okay? It's a good number of hands up, and shouldn't be a surprise why, I mean, right now there's such a big push for websites and web applications to go down the road of HTML5, and that's because a lot of devices, you know, well, we don't have a lot of those new devices, tablets, iPhones, iPads, they don't have flash. And the other thing is, of course, the browser has gotten really, really, really powerful. The other thing is, how many people here know JavaScript? Because I'm going to do a lot of demonstration, okay, a lot of code example, great, great, excellent. Okay, so just a little intro background material, what is HTML5? Well, it's very much like HTML5. It's going to be the next standard, the next revision of HTML, the mock-up language. But one of the things that you got to understand about HTML5 is just not the mock-up language. It's really a complete stack of technology. HTML5 is really dependent on not only content, not only dependent on the HTML, the mock-up language, but also presentation by way of CSS, and also a lot of JavaScript, you know, interaction with content. The cool thing is, I'm not going to talk particularly much on server-side stuff, it makes no sense anyway. But you're close enough using HTML5 as a full-fledged development environment. There is one caveat that you got to know about HTML5. It's still working target. It's still work in progress. Is that some of the stuff today that I'm going to talk about may change tomorrow. That's, you know, just watch out. Who knows when the standard will be finalized. And of course, this backing from Google, Microsoft, is getting pushed very heavily with the new Office and with the next version of IE. Now, not only is HTML5 work in progress, different browsers have different standards and right now the companies are like Google, Microsoft, everyone doing their own thing. So one of the most, you know, one of the drawbacks of using HTML5 right now, there's so many incompatibilities. Case in point, there's going to be video which I'll show you in a few moments. All right. But to check what your browser can support and do, you really want to go check, go to HTTP, go to HTML5test.com and it will render a score on your browser. See how compatible, like, what features your browser, the HTML5 features your browser can use. Okay. The other thing to also keep in mind is there's still going to be people who are going to be using IE6, IE7, you know, older versions of Firefox, you know, that's not going to change. But a lot of the HTML5 syntax, when it is going to be, you know, rendered on the browser, it's going to, it's still going to render, but all the real cool stuff, they're not going to be, you know, they're just not going to be rendered. Okay. So here's just a quick summary on what's in and what's out in HTML5. A whole bunch of, okay, what's in a bunch of new tags, video, audio. There's also semantic tags, for semantic stuff such as article, footer, and nav. Back in the days, and still we do this a lot, a lot of people use div ID equals nav, or div ID equals footer, or div ID equals header. So the nice thing is one of the standards now is why don't we just instead of doing div tag, we just have article tag. I know that in Chrome I can't get too much working with the article and the footer tags. There's a lot of new attributes, for example, for pattern recognition in form fields. There are also new media events, for example, time update when you want to play a video from one second to two seconds. Geolocation, you know, that feature that, you know, your browser prompts you to do you want to use your location, does this page want to use your location, do you want to, you know, give permission? I mean, I'm sure a lot of people have seen that. That's part of HTML5. Okay. And of course, client site storage, which is becoming like the next vector for abuse. What's out? Font and center tag. Anything that presentation based. Font, center, align, border. All right. Those are quote unquote no longer supported, like quote unquote no longer supported in the HTML5 standard. Why? You know, HTML5, HTML for content generation was never ever meant for styling purposes. So they're forcing you now to have all that stuff in CSS. Even applet is quote unquote not supported. By not supported means if you try to validate your page in the official W3C validator, you will get a lot of warning, not only warning but error messages and yell at you. Okay. Applets. And old special effects such as, remember the marquee? All right. Good times. Background sound. You know, those are no longer supported. And this one last guy right here, the no script tag. All right. The no script tag. I have a snortfall here. Can someone tell me what the purpose of the no script tag is? And why do you think it's no longer supported in the HTML5 standard? Okay. I'm going to throw it over there. Someone catch it. All right. Now I want to do a few quick demonstration that illustrates the power of HTML5. Okay. I'm going to do an example on first of all, a couple of video captioning examples. And if I have time, I'll also do geolocation as well. Okay. Now I had, this was an example. This was a video that was built by a high school student, Rachel Solmonte. She's taking my summer class right now at Tufts. And she did a video called Double Rainbow. And unfortunately, you know, I'm just going to play, I think the screen resolutions a little bit, but you know, there's no sound in this. Okay. This actually, I don't know what happened to sound, but there's a song that plays in the background. But yeah, it's very much like if you, if you just can imagine if there's sound playing, it's karaoke. No plug-in necessary. Okay. No plug-in necessary to generate this, to generate this stuff. Okay. And you got to find Bob. Let's go back. All right. And this is actually one of my favorites. This was in my web programming class, I had the students, you know, build a video caption. And this is one of a great student, a very good student of mine, built a video that took advantage of fail blog against no plug-in necessary. No plug-in necessary. So you can do some really, really cool things. For example, just pick yourself if you can do like videos and like interactive media for accessibility purposes. All right. Now here's the catch. Just to illustrate the problem with HTML5. This demonstration, this demonstration will not, will only work in Chrome and Firefox. It will not work in Safari. Okay. We're using, we're using a odd format for the video. So that's video. And I want to see if this, if I can go to about geolocation.wtf.html. And this demonstration is an exercise I made my students do is to find where in the world is our good friend, Carmen San Diego, based on where we are. Okay. So I built, I built a JSON feed which actually renders 10 random locations around the world. I am actually going to allow. Okay. It's nice that they actually prompt you permission. Okay. So here, here we are. We are here. And Carmen, of course, she is in the 16th chapel. If I hit the refresh button again, she is now in Big Ben. So this is, you know, taking advantage of geolocation. This was a lab that I had the students do. Okay. Just a quick demo. So how does an HTML5 document look like? There's not much to it. If you know about HTML, the mark, if you know a little bit about HTML, then transitioning over to HTML5 is not difficult at all. Only one thing that you really need is the dock type. And look at this, you know, is less than, you know, you know, exclamation point, dock type, HTML. It's all you need. A lot better than the dock type for HTML, isn't it? Pretty nice. That's it. And everything else really looks the same. Okay. So where are we with areas of concern? So where's the danger? Okay. The danger, of course, is with the web browser, the client side. All right. Now, because HTML5 can do so much, you can now see the complexity of a web browser. All right. You can do so much stuff, including like client side, like offline storage. Okay. Now, not only can you actually store a lot more information, content, but just imagine if you have almost a full-fledged SQL database on the client side. Okay. How about, you know, Ajax and the XML HTTP request object? Remember, in JavaScript, there was the same domain policy. But now, you can also do cross-origin requests and even cross-document messaging as well. Okay. And I'm going to show you a little bit, a little demo on, yes, you can actually do, you know, for example, calculating prime numbers, or whatever comes to your mind by way of web workers. Think of background threads. Okay. And again, HTML5, it's certainly going to make the browser worse. I mean, last time I checked, I remember the Chrome bounty that just recently happened, and I think two of them were specific to HTML5 features on Chrome. Okay. And the first thing I want to talk about is local storage and session storage. We all know the good thing known as cookies, right? Now, here's a deal about cookies just for your reminder. A cookie can only store four kilobytes of information. Local storage really depends on the browser. The last time I actually checked is around five, you can store five megabytes of data, okay, for a local storage item. Okay. If you actually can understand the whole concept of cookies and sessions, I mean, local storage and session storage, same idea. Key value storage. Very, very simple to use. It's ridiculously simple to use. Okay. Persistence is the only difference between local storage and session storage is persistence. Okay. Session storage, if you can infer from the name, it's going to last until the browser is closed. Okay. Local storage, something else. How do you use local storage or session storage by way of JavaScript? Really simple. If these are local storage or session storage dot get item, dot set item, okay, or delete item, or if you prefer, you can use the associated arrays to hash syntax. Okay. Using square brackets. Here is just an example. Let's go back. Okay. So, here we are. Local storage example dot HTML. The phrase that pays faces bonkers. Now, if I type in do, and do do. Okay. So, I'm in Chrome. And using Chrome developer tools, go to view, go to developer tools. Okay. And the resources. Right. You see databases, local storage, session storage, cookies, application cache. Let's go to local storage. See what happens. Okay. Here's my domain. When I do 168, 156, 130. And, of course, key value pairs. All right. You take a look. My key is just I'm just using, I'm just generating timestamps as my key. And the value, of course, is everything that I have entered. Okay. Not too hard. Let's go in and just view the source. Okay. View, you source. All right. So, my body is really, really simple. Not much to it. Okay. Just one day, one results. All right. One form tag. Look at my init function. First time it goes, first time the page loads is local storage dot set item. The key is phrase and the value is bonkers. All right. So, if I actually hit return, okay. Store. When I actually type something in, store is, okay, first get the thing that was in the text box, local storage. And, of course, the index, I mean, the key is going to be a timestamp equals text. All right. And how you render out all each and every item onto the page is that I just do a very simple for loop and, and just, you know, go through my local storage and just renders everything out to the page. Okay. Very, very similar to the whole concept of cookies. All right. You got the same domain policy. But of course, well, let's go back. I actually want to change something to actually I'm going to come here. And I'm going to say, all right. But what's one of the things that you can do with local storage? Go to view, go to developer tools again. And now I can go to local storage. And instead of do do, I can type h1, okay, and hit enter. And instead of do, I can have image src equals sum image. I don't care what it is. Can get out of here. I'm going to type Batman. Watch what happens. Okay. And of course, well, this is broken. There's no image there. But you do see that now trap actually is the you know, and as head of one now takes into effect. So now you're starting to see some of some of the old hacks. Okay, some really age old security principles. Okay. So think about this for a second. Here's the analogy. Is there a way that you can grab all the cookies on a person's machine, other than beating the person up and taking their machine, which of course that is another thing that you can do with local storage, you steal someone's machine, and look into, you know, take open your browser and just do what I did. You can see all the the other values. But you know, what's the other way that you can steal everyone's cookies? What's the attack? What attack do you use? Exactly. Oh, look at this. If you have a cross site scripting vulnerability in your in your application, look at everything in local storage is going to be susceptible. And as a simple example. Okay. And it just befodils me. This is always a principle that I always I mean, as dumb as the sound, you're always going to have people that decide to want to prove really sensitive information in local storage. Why? Okay. Next up, Web SQL. This is not new. This is not new. But I'm sure some of you how many people have remembered the good old days of Google gears. Unfortunately, you had to download a plugin for that thing. Okay. Now this will bring SQL largely like a lot of, you know, good stuff to the client side. Here's some of the core methods. All right, you have open database. Okay, you have transaction, and you have execute SQL. Okay, even prepared statements are supported. But you do also have the usual gang of attacks, cross site scripting SQL injection. Let me show you an example. Okay, let's go back. All right. And I got three simple example three examples. I am going to close out of this. And now I am going to do actual size. And let's go to Web SQL example number one. Okay. And I'm going to say something just really rudimentary. Go get me a beer. Homer. Okay. Homer says, go get me a beer. All right. Okay, I mean, it can hit enter. So with the data stored. Well, again, in Chrome, in Chrome, got a developer developer tools, got a resources, now I got a databases. Now, my database, I have one database called demo dash mchow. And I can expand it. And I have one table called thoughts. Okay, I have a date key that you have three field date key phrase and handle. All right. Now if I go to all right, the general why that's not showing up for some reason. About the tools demo. Okay. Well, and anyway, how this works. You source. This is what happens. So DB open database. DB equals open database. You give it a database name. You give it a version number, some description and of course, the last field, the 5000 or open database is some size. Okay. And here's around roughly five megabytes. So I'm only going to, you know, open a create a new database if, you know, it's, you know, if it is, if it doesn't exist already. If the database already, you know, if not exist, create table, you notice one thing about when I'm creating the table called thoughts, the three fields doesn't take in any data type no database data types needed. So it's pretty loose. Okay, now how do I insert data? Yeah. Well, I do a transaction DB transaction and I inside as a function. execute SQL. And there is my SQL statement right here. So of course, you know, another oldy but goody. This is bad. I mean, this is bad principle. Improvement to make of course is to use prepared statements. Alright, now let's go to example, not example one number one, but example number two. Okay, that works. Of course, we got a broken image. But let's take a look at the difference in this thing. Let's take a look at the difference got a view source. Everything is the same, except for one thing. Okay, this is a user prepared statements. Okay, execute SQL insert into thoughts, of course, find the variables with question mark. Okay, of course, this is nice, but we can do a little bit better. All right, we can do a little bit better. How about escaping all the special characters? All right, some things that, you know, it doesn't hurt reminding everyone image source. Okay, how do I actually did this is very simple. I just wrote my own job is I just wrote, you know, HTML and code. I just encoded all the HTML. That's about it. All right. So now we've seen local and session storage. We've seen web SQL. All right. Again, just, you know, the defenses again, well, don't store sensitive data in the client side database. You know, very much like Chrome, like local storage, session storage, you can see all the data in the browser. Okay. The other thing is, why do you think I mean, what's the real power behind local storage and web SQL? I mean, why is it so why is it, you know, why is it so useful? Yeah, because, you know, not only for, you know, cookie and tracking purposes, everyone here should read the paper on the ever cookies, but also if you're going to do offline work. Okay, so if you're going to work offline, and then you're trying to sync online, when you go online, can you really, really trust what is stored on the client side? All right. One other suggestion, of course, is to create your database and store data over SSL. Okay. Oh, and one last thing is, I mentioned a few times when I especially during geolocation, it's nice of course, if you prompt the user, you know, good feedback to just say, you know, hey, we would like to use your database, can we do so? Okay. Or, you know, application cash, application cash is really, really cool for offline using a lot of games. For example, if you're going to be on a mobile device, you can't depend that is going to be on a you can't depend on networking all the time. Let's say if you're going to be having like you have an HTML5 game, right, also reduces a lot of server load. Okay, right now, if I'm not mistaken, the size limit for cache data is around five megabytes. There are two things that you need to do in order to enable application cash. First thing is in the HTML tags specify the manifest file. By way of the manifest attribute equals, let's give it an example that manifests. So that's the file that's going to be stored on the root on the root level of the server of the WWW folder. But of course, the manifest file example not manifest is going to look like the following. You state the files that need to be explicitly cached. HTML, CSS, and you know, all the static content, you can also specify what content need to be, need required network, you know, require networking. Okay, how you update the application cache, you can add an event listener. All right, update. And of course, the attribute, you know, the event is going to be checking. And you know, the function, you write a function called update cache status to just do something. Well, one of the things about the danger points about the application cache is it can be poisoned. All right, the point, the purpose is, not only is there's no permission necessary for any site to cache data on the browser, but look, what are you really caching? And there's a real big catch. Okay, so what happens is although the root resource is cached, but the normal cached is updated, but not the application cache. Okay, cross origin, JavaScript request. Okay, this is not directly part of the HTML5, but it is introduced. So of course, if you remember, from XHR, it follows large back into good old data, it largely follows the same domain principle. Okay, IE and IE8, there is a new JavaScript object called X domain request. But even now, in a lot of cases, like in Firefox or in Chrome, the XHR can actually allow for cross domain request. How? You guys remember that, the source? Remember that example with Carmen San Diego? Well, let's take a look. If you actually, I'm going to go in for www, I'm just going to give you a little snapshot more. And of course, the script that is used in the request.open in XHR is ACME. Oh, actually, I don't have it here. Oh, XDR PHP. And there you go. I know it's a little bit. Let me zoom this in. Zoom this in. How I get the JSON for the Carmen San Diego demo, well, is just a few lines of PHP. But what I do is I specify the header. Okay, there's a flag called access dash control dash allow dash origin, followed by a colon. And then it should follow a whitelist of allowed sites. I'm using an asterisk here. Now this is not good. This is a wildcard. All right, I'm allowing everyone and everyone in the world to actually have access to this data. This is bad. Okay, if you're going to use the wildcard. What's even worse is the following. What if, okay, I do access control allow origin HTTP colon, of course, and of course, some origin, but you know, and then I have percent, what is wrong with that second example? Okay, how do you actually separate? Okay, a whitelist of allowed websites is using a space. You put that one you put that in is going to the percent 20. Well, it's going to be equated into a space. And so you have actually not one you have two domain that are allowed. Well, how you get around how you actually what's the difference? Two things. Number one, you want to add some course either a some sort of authentication or some key like Google Maps did back in the days, or validate the response. Okay, cross document messaging, it is possible to actually, you know, to actually build, for example, a you know, a chat application using straight HTML five now using the whole idea of post message, you can communicate, you know, between two different domains, two different, you know, two different, two different origins, two different frames. There's two parts of the two parts. There's a sender and as a receiver, the purpose of the sender, okay, that's what the sender will do, it will post some message, while the receiver is going to listen for messages being sent. Now I'm just going to give you a demo. Okay, here is a demo. Actually, this time I will need. Okay. So I have CDM receiver on the server. Okay, I'm going to view source. Okay, now the domain of the receiver is 192.168.156.130. And look, the receiver is going to have to I mean, the code is going to have one event listener. All right, that's going to listen for a message. And then it's going to check for the origin. All right. So what about the sender? Well, all I can do, I'm going to open file, I'm going to open a file on my desktop, my desktop, dc19, examples. Okay, CDM sender. Here it is. This is a sender. That's a knife frame that contained code to the receiver. View source. Okay. So what I do is, you know, have one text field. Okay, one form field. And then all is going to do where is the post message? I'm just short this. Here it is. One eye frame. And the post message. Post message. The message and of course, the domain. So how this works. Okay. Hello. Goodbye. And remember. Okay, it is everything is all listened by way of the receiver on 192 on my virtual machine. So what's the problem here? Well, here's the here's the catch. You really also you want to verify where the message is coming from. Okay, and you can do that. All right. There is when you are the receiver, when you're listening for a message. Okay, the callback to receive a message. It takes in one argument, the event. You can check where the origin is. Okay, event origin equals equals or does not equal. All right. And you can check the origin that way. Okay. Yeah, it's also been a subject of controversy. For more information, of course, check out the Mozilla SDK documentation. All right. Some really powerful stuff. Web workers. How do you actually add comp you can now add really good computational power. You know, to the background of whatever page you're actually working on. Okay, think of threads. Okay. And it is really, really easy to use. All right. Very, very easy to use. How you actually spawn a worker. Well, you set a variable. W equals a new worker. Okay, some job. And of course, the parameter is going to be some, say some JavaScript file. All right. On message. Of course, you can also have that JS file actually send back messages to you. Okay, and to terminate the thread you just do. Well, it's going to be W dot terminate. Okay, there are some few caveats. For example, what worker has to be on a server. It cannot run on your on the machine. Okay, same origin principle applies. But here's the real funny stuff that the worker have access to what stuff does a worker have access to it have access to older navigation object, the X HR, even the application cache and the ability to spawn other workers. Really, really cool. What the worker on the flip side does not have access to of course, like the document object, and the window. What can you do with this? Use your wildest imagination. Someone here tried to build a password cracker. What's this thing? You can now do this. All right. Now, how does this work? Give me you I'll give you an example. Get out of here. Go back. And this. Okay. What's the example? Which one? Ah, here it is worker test. All right. And yeah, there's no way to kill this thing either. There's no X button to stop this. All right. How this works? I have the HTML. Let me yeah, it's still going to run. But I don't want to crash this machine. Okay. There it is. It's even more. So there's no way to stop this other than for me to close the browser. All right. So how this really works? Well, let me show you what could really happens. Clear. Okay, more worker dot j s. That's it. That's a script. Okay, there's very simple script to just find the prime numbers. How the page looks more worker test two HTML. Here it is. Very simple markup. One core computation. All right. I have one script tag. Okay, worker equals new worker on a variable point to that worker JavaScript file that I just showed you. All right. And of course, you can actually check if there is an events from the JS file sent back to you by, well, worker dot on message, that's when you're going to be receiving messages from the JS file. Let me go back. Okay. How this JS file, this thing that gets response and mass messages back to your page. Look at that last line. It's a post message. Okay, it will send data back to the HTML page. Okay. And that's web workers. Really, really powerful stuff. But of course, you saw the demo on video, you saw geolocation. One other thing I haven't even talked about is what about, you know, the canvas? What about browser rendering of 2D and 3D graphics? What about SVG rendering of, you know, just on the fly SVG rendering? Well, depends on the browser. Okay, here I'm going to pose a scenario. A lot of people now using like, you know, using video HTML five video like YouTube and bright cove. What happened if their flaws in the codec? Think about it. Okay. Also, this actually works. You can embed a script. Okay. Inside the on error attribute in the audio tag. Already, I already pointed out, in the latest bounty hunt by Google Chrome, two of the bugs were found was specific to HTML five features on Chrome. Well, there's also a heap buffer for that there was a heap buffer buffer overflow by way of that, you know, that, you know, had pertain to every had everything to do with the HTML five canvas and opera. Okay. Also, we all love regular expressions, right? Regular expression for pattern matching. Well, there's one new attribute in for in the form field in HTML five, which is called pattern. Alright, you can do client side validation. Let's say client side validation of an email address. Okay, an example will be input and then you feed pattern equal well, somewhere I guess. What happens? Okay, if, for example, your pattern is like a star. And of course, your value is just a long running stream of A's. Well, there is a potential client that there is a potential for regular expression in our service. All right. And of course, this was actually found and one of the versions of opera. Okay. So what's the whole story here? And also in summary, look, HTML five can do a lot of really, really cool stuff. You can build full fledged web applications. All right. And a lot of really groundbreaking seminar seminal stuff. Right now, the only thing that I know that most that Mozilla is working on a plugin for taking, you know, for accessing your camera and your and your microphone on your device by way of JavaScript. So there is a potential that you may be able to build chat roulette with HTML five one day. But here's the bottom line. Look, you know, you can do all these cool things. But there's two things that really stands out. Number one, security seems like a complete afterthought, like most of the times when they're putting together the HTML five standard. Okay. And the second thing is a lot of the stuff to really defend yourself. This is not new. Okay, it's not news. Like for example, how do you actually, you know, you know, the defenses for local storage, web SQL, you know, they're all the stuff that you've heard in the past. Okay, sanitize your data, you know, don't store sensitive information. Okay, a lot of the same old problems, same old resolution and just a brand new environment. Of course, another caveat just to keep in mind is the standard is for evolving is changing like every day. So it's even hard for me to keep up. But why am I giving this talk? Look, HTML five, this whole topic of HTML five is not going away anytime soon. All right. Now because of the iPhone and the iPod, you know, all these brand all these devices, okay, because here's the problem if you build mobile applications, mobile application, you're going to use the Android SDK or the iPhone SDK. All right, it's a lot of fragmentation issues, because you're going to build a web app, you have to you build something on directly for using the Android SDK directly, you know, using the iOS SDK. All right. HTML five is one of the most viable options. If you want to have, you know, one environment just one app that can run on everything. All right. And a lot of people would now know of the problem of segmentation in the mobile app and mobile app world. Okay. So this can't be a just HTML five just cannot be ignored. All right. And it seems like well, I got some good news. We I got 10 minutes left. And I am done.