 The next step in coding and data science when you're working with web data is to understand a little bit about XML. I like to think of this as the part of web data that follows the imperative data define thyself XML stands for extensible markup language. And what it is XML is semi structured data. What that means is that tags define data so computer knows what a particular piece of information is. But unlike HTML, the tags are free to be defined any way you want. And so you have this enormous flexibility in there, but you're still able to specify so the computer can read it. Now there's a couple of places where you're going to see XML files. Number one is in web data. HTML defines the structure of a web page. But if they're feeding data into it, then that will often come in the form of an XML file. Interestingly, Microsoft Office files, if you have doc x or xlsx, x part at the end stands for a version of XML that's used to create these documents. If you use iTunes, the library information that has all of your artists and your genres and your ratings and stuff, that's all stored in an XML file. And then finally data files that often go with particular programs can be saved as XML as a way of representing the structure of the data to the program. And for XML tags use opening and closing angle brackets just like HTML did. Again, the major difference is that you're free to define the tags however you want. So for instance, thinking about iTunes, you can define a tag that's genre, and you have the angle brackets and genre to begin that information. And then you have the angle brackets with the backslash to let it know you're done with that piece of information. Or you can do it for composer, or you can do it for rating, or you can do it for comments. And you can create any tags you want. And you put the information in between those two things. Now, let's take an example of how this works. I'm going to show you a quick data set that comes from the web. It's at airgas.com. And API, this is a website that stores information about automobile Formula One racing. Let's go to this web page and take a quick look at what it's like. So here we are at airgas.com. And it's the API for Formula One. And what I'm bringing up is the results of the 1957 season in Formula One racing. And here you can see who the competitors were in each race and how they finished and so on. So this is a data set that's being displayed in a web page. If you want to see what it looks like in XML, all you have to do is type XML onto the end of this dot XML, I've done that already. So I'm just going to go to that one. You see is only this little bit that I've added dot XML. Now it looks exactly the same because the web page is structuring XML data by default. But if you want to see what it looks like in its raw format, just do an option click on the web page and go to view page source. At least that's how it works in Chrome. And this is the structured XML page. And you can see we've got tags here, it says race name, circuit name, location. And obviously, these are not standard HTML tags, they're defined for the purposes of this particular data set. But we begin with one, we have circuit name right there. And then we close it using the backslash right there. And so this is structured data, the computer knows how to read it, which is exactly this is how it displays it by default. And so it's a really good way of displaying data. And it's a good way to know how to pull data from the web, you can actually use what's called an API, an application programming interface to access this XML data. And it pulls it in along with its structure, which makes working with it really easy. What's even more interesting is how easy it is to take XML data and convert it between different formats, because it's structured and the computer knows what you're dealing with. So for example, one, it's really easy to convert XML to CSV or comma separated value files, that's the spreadsheet format, because it knows exactly what the headings are, and what piece of information goes in each column. Example two, it's really easy to convert HTML documents to XML, because you can think of HTML with its restricted set of tags, that's sort of a subset of the much freer XML. And three, you convert CSV or your spreadsheet comma separated value to XML vice versa, you can bounce them all back and forth because the structure is made clear to the programs that you're working with. So in sum, here's what we can say. Number one, XML is semi structured data. What that means is it has tags to tell the computer what the piece of information is, but you can make the tags whatever you want them to be. And XML is very common for web data. And it's really easy to translate the formats XML, HTML, CSV, so on and so forth. It's really easy to translate them back and forth, which gives you a lot of flexibility in manipulating data. So you can get into the format you need for your own analysis.