 I'm going to talk a lot about data, not so much about protocols, and given the fact that most people call me uncle by now, I talk about things which came before the shiny new world of everything the rest and Jason. So you can find me on Twitter at notesensei, I'm currently working as a software engineer for IBM. I did a few crimes in software which our customers have to suffer from, but that's a different story. The public ones are on GitHub. Okay, so before Jason came in, I will probably talk a lot about XML, but to really, really go back, let's have a look at the very first one that was edifact. And these are the two, there's way more of them, but edifact is very, very curious, as I say. So it's the first large-scale public API released in 1998, who was born after 1998, okay. Born after 1998, okay, there you go. And 1998, I think I bankrupted my first company already. And what's quite interesting about edifact is that the standard is maintained by the United Nations. So usually you have, like say, all sorts of standard organization, but that one belongs to the United Nations. That's quite a particular one. So it's officially called UN slash edifact. This is electronic data interchange for administration, commerce and trade. That's the link behind. The interesting thing is, like say, if you're superhuman, you actually can read it. But other than all the other formats we know, it doesn't carry any description in its payload. So that's a payload. So if you're superhuman, you look at this and you said, okay, that's very clearly so. Somebody flies from Frankfurt to New York and then to Dallas Lagart. Well, no problem. So it has start as a UNA that starts and UNZ, this is the end of the story. And you see these little single epistrofs. This is only for the purpose of readability. When it gets transmitted, there are no new lines in there. So it's one big fat set stream which makes it an item to process. What makes it ideally to be transmitted over, like say, you can actually use Morse code to transmit an edifuck message, very compact. Was widespread used in very large organizations, never took off in the middle because, like say, the companies who offered edifuck libraries, they said, sure, you can have our libraries. We start discussing at $50,000 for a library. And then the consulting fee and the implementations, it was always a multimillion dollar exercise. So it's still around. There is an XML version of it, which is even more cruel than the pure format. But what made it very, very special is that it had tons of document types. And that was, like say, that later groundwork, what was coming later on, you see the list goes on and on and on and on and on, that they didn't pay too much attention to make it elegant in terms of IT, but they paid a lot of attention and said, any type of business message could be already expressed in edifuck. Just to get you an idea about it. I really just, I can read this without glasses, but not this one. Okay, so, but let's move a little bit on to XML. XML is a descendant of SGML, what is this, generalized graphic markup language, or gruel, even typo in there, is actually younger than the first HTML standard by one or five years depending on how you count HTML. They have a little love trial, XHTML or XTML5, so one of the little headaches when you use XML tools to process HTML that they're kind of, like Malaysia and Singapore, doesn't kind of work, but kind of is the same. Do I get kicked out of the country now? Okay, so, quick on to the structure, XML always has to have one, it's like Highlander, there is one opening tech and one closing tech. You cannot have my tech and my tech below, it always needs to have a bracket. This expresses the beauty of it, it has elements, that's the one in the angle bracket, and an element can have an attribute. There's a few rules, like say, an element cannot start with XML in its name, cannot laugh. You cannot have spaces, you cannot have special characters, it has to be 8-bit, it cannot be Unicode. When you put attributes in there, your software must not depend on the sequence of attributes. So it's a little bit more restrictive when you do JSON, if you want to have Kanji characters as a tag name, you just put them in quotes and it's good, XML, that's not the case. They always need to be closed, and if you have an element that has nothing inside, you just close it up with a slash. Inside an element, there can be text, another element, or nothing. Well, that's simply the ground rules, and if you don't mind, angle brackets is actually quite easy to read. But XML itself, what's the deal, comes with a bunch of standards around it. So first is like, say, XLST, that's a transformation language, I can take XML, transform it into other XML, I can take the XML, transform it into HTML, take the XML, transform it into PDF. If you want to know how to do that, there's a six-article series on my blog, step-by-step, why and why, and what are the toolings and all that. There is XPASS, and there are the XML schemas. XPASS, that's one of the query languages that clearly separates the boys from the men, because I said, so what would you have, I said, we have SQL, we have Ragex, and we have XPASS, these are the three ones. If you master them, then you can say, okay, I know about IT. And then, let's say, if you want to be a wizard, you understand Perl, but Perl is the right one, never read language. The special thing about XPASS is that it can traverse the tree structure of an XML document. So I can say, give me all elements, where two elements before that, there was a red dot in the attribute. And I'm not aware of any other query language, maybe GraphQL, I have not too much experience there, that is capable of relating up and down the stream. In SQL, you can do like the outer joints and inner joints and all this stuff, but you can't traverse back up and down in your own table. So that's the pretty cool thing about it. But the really interesting thing about XML, it has the capability of namespaces. So what does bank mean? In the financial industry, that's where they steal your money. I know there was government. Sorry. In geography, it's the edge between the water and the land. And I say, if you happen to be in the Air Force, which I happen to be at one point of time, let's say, how fast do I make a turn? Well, slow bank, fast bank. Since language is so ambiguous, XML introduced the namespaces where I can say financial colon bank, geography colon bank, aeronautics colon bank. And that allows me to merge things together. For instance, when you look at the ODF specification for open document format, they had already MathML, markup for, and just merged them in. They have similar name tags, but the namespace helps them around. This is a bit of a tree. What have I found on XML schema? So there's SO, BUDDI, WSL, WSTL, sorry, which is plumbing what I use on the wire. The document schemas, there's obviously Office, there's ODF, and the so-called open XML. And one of the format you should know is DITA. All the world's engineering documentation, like I say, the manual for the 747 engineers is written in DITA, which is an XML format for documentation. Took me quite a while to wrap my head around it. And then another similar link. So there is, for instance, here on Wikipedia when you start scrolling, these are all different, different, different schemas where people have thought about standards, hopefully incomplete. The next one is, you look here, that's the other one. EBXML, that's a successor kind of ed effect, spiritually, not technically, which has all the schemas around, and you see all the different versions. Again, what business objects are they? So the big thing about XML is not so much that you have nice curly brackets. I like the differentiator between elements and attributes. But there is incredible amount of work went into making clear machine readable languages for different, different business domains. And I say, so, which then gives you the idea, I say, oh, if I control my browser and my back end, I don't need all this. That's correct. When you talk to other systems outside, you need to have a very clear understanding and specification of what's going on there. Okay. So there's a few links, so the presentation is online where you can have a look. And since they are all engineers, I put the detail link here in specific. And my graphic broke. Okay, since my graphic broke, I just show it you live. So I personally, I use Oxygen. That's a graphical editor for XML. What you see here is a schema. So when you look at it in a textual version, so this is the XML definition of how an XML file should look like. Now it's not inception. And what they, the tooling is quite sophisticated. They said, okay, there is an agreement update request. As I say, you send to your contractual partner, hey, actually we need to change something. And it says very clearly, okay, this is all the things. It has an ID, it has a creation date. When do I expect the response, when is it active? And then I can go and drill into what is expiry is an ex-state. That's what is the current agreement modifier. And I said, okay, it's a non-empty string, and it has some attributes. So you can nicely graphically drill down to do all this. So the tooling is quite mature. So there is Oxygen XML, XML supply, and a few other tools. I think Visual Studio has pretty good schema, graphical schema editor as well. I don't use it because I'm mostly Mac and Linux. So you see, it's quite easy to work with the stuff. Okay, processing XML. So a few things, like say, what should I use soaps? Should I receive this rest? How does XLS T transformation look like? As it's only for, I said, then how do X pass queries on John? How does it work in the programming language? What I found, it's incredibly painful to do that in JavaScript. So I said, there's a library in Node.js that passes XML into JSON. What's coming out there is pretty crappy, because attributes and elements don't map into JSON definitions that well. Where you actually find a natural environment is the tools in Java, and I presume in .NET as well, are pretty matured, dealing with XML when you have your well-defined objects. I'll show you a little bit of code later. So, for rest, so everybody does rest today, this is class. We do that over HTTPS, this is create. When the server is down, you can't process. The advantage of soap, so I usually explain it. Rest is like making a phone call. If the other guy doesn't pick up, you're stuck. Soap is basically you take your payload, you wrap it in an envelope, and you submit it. All the examples you find when you do Stack Overflow are posting a soap message to a HTTPS endpoint. If I look at enterprise implementations, they submit it to an MQQ, they send it to SMTP, they have their file system based store on forward. So I, FTP, have a soap file somewhere, and then some other process on the machine picks it up and processes it. So if you have an environment where you're not quite sure whether there's connectivity, soap might actually be your better bet. Of course, it is inaridly capable of store forward, rest is not. So that's one of these little things people would like to overlook. And soap even would work over the RFC 1149 IP over Avian Carriers. You're familiar with that one, no? No? You should really do it. You should try it. I haven't published it in the first vehicle. Okay, quick code. I'll show you some life example in a second. It's just about four lines to get XML properly pulled into Java. So it's very, very straightforward. If, and that's of course, where's the fine little instance? Jup, jup, jup. Okay, that's easy, that's an XML document. And you can translate very, very easy from your XML into your properties of your Java class. What we typically use there is Jackson annotations. So I put in my code XML root element and XML element and then I even have four lines to read and write it from the Java object back into your JavaScript values. Go away. I've been participating in a project for XML processing in Java. And I was firmly on non-Java part of the project. I don't agree with it. Actually, I did with XML before. We used to process 2G XML files and MS-XML is ridiculous. I don't know if this code of MS-XML is not very good. Yeah, everybody uses parsers that are way faster. I wrote my own parser in C for that. Change our parsers like four times until it was somehow variable with just one megabyte of it. Because the problem is that- Okay, cut it, cut it, cut it. It makes a divine gigabyte here. It's very simple. We start with XML and what we do, we take the XML that comes in, put it in a nice memory model, and I say you have a one gigabyte XML file, you probably need seven or eight gigabyte of memory. Bad idea. And it takes a very long time. No, you don't do this, you don't do this. So what you do if you have chunkable XML or chunkable JSON, that's the same thing, is that you use a sex parser. You use an event parser and process the stuff as it comes in. The guy who wrote the specification for XML, XLST, now Michael Kay, he wrote a sex and parser which is available in C, which is available in Java. He even has a JavaScript version of that. I said, once your XML is like, say, above the five megabyte limit, you don't do DOM anymore. This is roughly the same thing. That's what I did also, yeah, I used a stream processing. Yeah, of course. I'd bring down eight hours to two seconds to three seconds, actually. Exactly. Because instead of doing loading in memory, doing the XML processing, no, never. I don't know Java, I see. I must see. And the funny thing is like say, what most of the people actually don't know is that you can use the sex parser to write out XML as well. You can stream it out quite nicely. There was a post on the Oracle forum I wrote ten years ago. Okay, quick one. Does that work? Oh, it works, yeah. So that's an example for the simple one. So I have a fruit basket object and then I put apple durians and grapes in there. I don't know why I did durians in there because I don't like them. And then when I want to go and write it out, I simply say, get me a new instance of Jaxby content, create a marshaler and then write it out. So how does it know what to write out is, oh, how do I get back? Yeah, I look at my fruit class and in the fruit class, simply I have things like you see XML root element, name is fruit, access a type field, I only want to have fields. And then I said, okay, if I specify this one, then it takes only the fields I have in there to write out. And then when I run the example, fruit basket, it's up, run, where's my run, run s, Java application, it goes and writes it out. And I have the fruit basket XML. Here I got the XML and you see, I decided that my name should be an attribute, not an element. And I also wrote it out as JSON and you have it in there. An obstacle is German for fruits. By the way, JSON also has the same fault that you need to read the whole document. Yeah, yeah. Unless you have a J to even answer for JSON in the back of the case. That's beyond my knowledge, I say my JSON, my JSON is small. If I go big, I use XML, okay? So you see, this example is quite interesting because I say, I have safe XML, I use a JXP context. And I have same here, safe JSON. My favorite one is the, I use the JSON library from Google. Because I think I say, if somebody writes a lot of JSON as other guys, they make sure no special character floats in there, which shouldn't be in there. And all that. And they have, this mechanism takes a memory object, but they also have a JSON writer where I can say, add a new element and just pumps it out straight away into a stream. So I don't need to build the whole thing in memory first. And especially in Java 8 with a stream API, it gets rather snappy. And then like I say for old people ditching Java, have a look at WordX, which gives you a programming style very close to Node.js. Okay, just to give you a little idea. Another little example, RDF is a resource description framework. There's an XML format. I downloaded a library from a file from Gutenberg.org. This is where you get all the books that are out of copyright. And then there is a little XLS T1, which says, okay, please translate that for me into HTML. And in banks, straight away creates a book list. Or I can go and say, okay, oops, where's my other one? I have another, so then I said, okay, translate that for me. And it goes and creates a different XML list for me. So I can use the same data source and then with just a tiny little change, I can go and create XML or HTML or even PDF if I want to. PDF is a two-step process. Step one is I take the XML and a style sheet and turn it into XLSFO. And then I take the Apache fob processor, so that's the only one that is freely available, and turn it into PDF. Quite a bit of headache. On my blog, I find a tool which helps you to design the fob stuff, so you don't need to write it off memory, because that's quite a bit of learning curve involved. So that was, then, so the key takeaway is like say, between browser and backend, don't bother with something else. Protocol buffers, JSON, whatever. Anyway, like say, if you have to transmit four megabyte of data, you're either uploading PDF, uploading PowerPoint files, or your application does something wrong. It rules on the back end, like say, you will have to search for a very, very long time to have the richness and schemas in any other environment, short of maybe EBX, sorry, any effect to have a specific data type. So most of the time, you need to talk to your back end, it will be XML. You want to learn Xpass, there's a full hour talk on how to do Xpass queries with all these things, so I don't torture you with that. It's painful in JavaScript. There is, this is where the presentation is, actually. So you go to GitHub IO, STWO, little GitHub IO presentations. There are all of them, and then AP Graph is the one for today. And I also have a 60 slide presentation on this link that takes you down the rabbit hole in producing, consuming XML and all the dynastic detail lessons I learned along. Questions? Do you use Ralex and G still, or just XML schema? I didn't see that much advantage or relax and G. Mainly because I'm lazy and I use other people's schema, though I don't write my own. Because in the war, in the war, in the war, in the XML schema one. For many more years. Ralex and G was very, very fast. Did I remember correctly? Yeah, but better marks was also better. Probably, like I said, it's the same. Now, I'm not telling the story why better marks lost. Yeah, but it's actually almost equivalent, but much more readable. Because I actually made big schemas. And I used to write, still in XML schema was winning at that time also. I remember when I was working. Anyway, the big schema, if you want to write a big schema, if you comment the editor like, I'll talk about XML is high, that makes it a diameter. For big schemas, I don't think you can just use the source code. Unless you remember it by heart. We just had a look at that. So I use Oxygen XML, because that's cross-platform. Altova XML is pretty cool as well. XML is also a nice to use that one also. Writing large schemas basically, but it doesn't work. Well, all of these editors, they have this mechanism. They said, I take an XML schema, please output a relaxNG. Take it or XNG and put an XML in there. Okay, any more questions? Even XML is still very popular, actually. Of course it is. Because at least in the US, you can build a service like Router. So just because of the XML, because now every company needs to submit their data in the XML, it is a parcel of data. So I can build a comprehensive like service in the US. But I think Hong Kong has just did it. Singapore yet still accepts PDF reporting. So I don't know when they might. I think it's hard to develop a document for that. But XML is still the only one that can do that. It's been here for 50 years. I mean, from less to your murder, XML gets to your parcel for 30 to 40 years. That's one of the interesting questions that I say when you look at all the APIs. I say, what's the ultimate storage format? So any binary format is not future proof. So at the end of the day, you need to have, like I say, so the only binary format that is half a future proof is zip. So you go and take your text form and zip it together so it's small, because that's pretty well understood. But the rest is, no? That's what the fight going on right now between protocol buffers and JSON. Because if you Gzip the JSON and set it out, the payload is not really bigger than the protocol buffers. The difference is very little. But the protocol buffers is not readable. Yeah, depends on your debugging tool. Yeah, it is. Okay? You mean that you need a schema for a protocol buffers to be aware of that? XML is self-subscribing. So there is one other thing. If you want to avoid big documents, you can do it similar like open documents. Basically, you stack a lot of XML files that prostrate each other into one big zip file and then it's pretty robust. We also, we did in a few CIM applications. So they send out PDF documents which eventually might get faxed or might be emailed and we use the XML header to store the metadata. So you could like say, so there was a big fat database which kept all the relations and all the stuff. But all the document-related data was stuffed into the header as an XML. So you could blow the database out of the water by scanning all the individual PDF document, recreate the database. So that was one of the precautionary measures because I knew the IT manager then was a little bit stingy on his backup strategy. But all the PDFs were stored on optical storage. So that saved the company the neck ones by being able to take the individual file and then based on the metadata recreate the database behind it. Okay, thanks a lot.