 I am Siddharth and in this session, we are going to talk about XML technologies from an implementation perspective. Specifically, for Java technologies, Java provides two flavors, two sets of APIs, that is JAXP and JAXB, as Professor said. JAXP is Java API for XML parsing, where you handle XML documents in native XML format. On the other hand, there is a JAXB, where you actually create application model and represent them in the form of XML, and that way it is meant for different kind of users. JAXP is meant for XML programmers, whereas JAXB is meant for application programmers, where you create application model and then access it as an object model. So JAXP, Java API for XML parsing is a set of APIs, which provides two functionalities. Essentially it will give you parsing APIs, which is again in two flavors, one is DOM and another is SAXParser, we will take a look at it in later stages. The second is transformation, which we will cover in later slides as well. So transformation means essentially you have XML in one format and you want to convert it into other similar format, either XML or some other format like HTML or PDF or something like that. And Java API for XML binding is a set of API, which will enable you to create classes from schema, that is your application classes from given XML schema, and it will create objects for you automatically given an XML document that complies to the schema for which you have created the classes. After diving into the details of XML JAXP, we should take a look at whatever the kind of parsers that are available. So first one is SAXParser, that is simple API for XML parser. This is a stateless and event driven parser, this means that this parser is going to go through your XML file sequentially, as in when it encounters any say element or attribute or something like that, the parser will not know by itself what to do next. It will just generate an event, programmer on the other hand will have to write handlers for this event. So they will register handlers to handle all these kind of events, so for example what happens if I am, if the parser encounters a start tag, what happens if parser encounters an attribute, the handlers have to be written explicitly by the programmer. Now this is read only kind of parser, that means it will scan through the document once, you can only read and it is stateless, that means it will not remember what it has read just you know couple of seconds before, a millisecond before. It has a relatively lower memory footprint because of same reason, whereas there is completely another flavor of this of XML parsers that is document object model is DOM parser, it will create an in-memory tree representation of whole XML document. Now this has a higher memory footprint because it has to create a tree from an XML document and keep it in memory, so if your XML is say 4 MB or 5 MB in size, it is going to cause a performance degradations. Now what is JAXP, given that now this parsers the SAX API and DOM API, these are W3C standards. Any vendor can provide reference implementation for this standards, JAXP is a pluggable architecture for this kind of for XML parsing, that means I can use any implementation, any DOM implementation or any SAX implementation and I can just plug it in without having to recompile my code. It has also transformation engine, it also wraps transformation APIs that is you can use any accessibility transformer like one provided by Apache and there are others as well by other vendors and you can simply plug them in without having to recompile your code. This JAXP package has been included in standard JDK from 1.4, version 1.4 and onwards. Now this is how you, if you take a look at this code snippet, this is a factory pattern, JAXP has factory pattern implementation. So that means if I set this document builder factory to say com.icl.saxon.om.documentbuilderfactoryimpl, that means that I am going to use say some Saxon parser as a DOM parser. Now this is a system property, so whenever I am going to create an instance of a parser I usually, first thing I have to decide is which type of parser I am going to use. So if I am going to use SAX parser, I will use a SAX parser factory. If I am going to use a DOM parser, I will use a document builder factory. Once I have fixed on what kind of parser I am going to use, I just use that factory method to create an instance of the parser. Now by default JAXP uses Apache Xerces and Apache Zallan as an accessibility processor whereas Xerces is XML parser. There are other parsers available as well like Alfred and Saxon and if you want we can use that as well using JAXP. Now right now it complies to SAX 2.0 and DOM level 2 parser, these are the specification versions, so SAX 2 version and DOM level 2 version. You can also, so main major functionalities are that you can parse, you can transform and then there are certain flexibilities which are provided by JAXP. So for example you can do validation and you can do more rich error handling and you can create and save DOM documents using JAXP. So this code snippet shows that you have to create, how do you create a factory instance? Say SAX parser factory SPF equals to SAX parser factory dot new instance. Calling new instance on any factory implementation if you are aware of factory pattern it will give you a factory instance. Similarly if you want to create one for DOM you should use DOM document builder factory. These are standard interfaces which are provided by JAXP. Now once this is how this SAX parser looks like in JAXP. You have a SAX parser factory from where you will get a SAX parser instance. This parser instance will have default implementation which reads your XML file sequentially element by element. This side of the diagram has to be implemented by the programmer. Content handler, error handler, DTD handler, entity resolver. These are different types of handlers which needs to be registered with your SAX parser. Whenever it will encounter say any XML content it will go back and hit your content handler. Whenever it encounters any error message while parsing say for example if you give an inconsistent XML document it will go and call your error handler. Similarly it leaves up to the programmer how they want to handle the situations. So content handler means what happens if the parser will get a start element tag or an end element tag. So if you take a look at this simple XML file parser will start parsing from CEP lab. So first element this is a start tag of an element CEP lab. Now when it encounters this what to do with what to do now parser is not aware of that. So once you have written this content handler there you specify that you should do something with whenever there is a start element type which is encountered. So this count then DTD handler means whenever if there is some DTD tag in your XML document you have a reference to a DTD. Event is automatically generated that is by default implementation. Exactly. Whatever parser is that we are seeing here you know Jaxpy is an example of what we call push parsing. In push parsing you register your handler with the parser. The parser pushes content onto you. You have no control over whether you can accept or not accept it. You have to accept everything you can junk it if you want but you have to accept everything. So that's what push parsing. Now there is a new set of APIs that have been released called pull parsing APIs which is included in a handout that I have given called stacks. We are not going to cover that because we are not going to have time to do all types of parsing here today. But those are called pull parsing APIs. So there instead of registering a handler and allowing the parser to control the pace at which you handle things what you can do is you have given an iterator or a cursor API just like a database cursor in which you can pick elements one at a time at your own pace. So you pull stop rather than being pushed off onto you. Those APIs we are not going to cover but just be aware of the fact that there are full parsing APIs that are becoming very popular today unfortunately simply because the fact that now the client is much more of control as far as the parsing pace is concerned. Now whenever I am ready to go get a new element I can do pull something and then process that and then get the next element at my will. Also it turns out the footprint of these full parsers is much smaller than the footprint of the issues that sit just part of in terms of footprint. So for parsing on mobile devices for example full parsing is actually becoming the preferred way of doing things. So be aware that there is a handout that details the stacks API as well, the iterator and the cursor API as we have seen it. Yeah stacks, STAX. Streaming API for XML. Streaming API for XML. So I was talking about what happens when you encounter a DTD. So you may want to fetch that DTD from some remote location and do some validation on your XML document or some that kind of processing that should be written in DTD handler interface. Then you have an entity resolver. So what happens if you have referenced an external XML document fragment from your, from one document. Then at that time that external fragment has to be fetched or maybe do some processing and that kind of stuff has to be addressed in this entity resolver. And this code snippet actually shows that here you are getting factory instance. Then you are getting an XML reader out of that factory. And then when you give it a file or a stream, it will do the parsing. So you have to call this XML reader dot parse and give a file name maybe or some stream API or sorry, not stream API, some stream or some URL. It will pick that file and it will do the parsing for you. For more details you can refer to Java docs of Jackspeed. There you will get all the major details of how do you parse and how do you register for event handlers. Whereas on the other hand there is completely different perspective. Here is your XML file that is f dot XML. Document builder factory is again factory for creating DOM parser. Document builder is actual instance of your DOM parser. It takes XML data and it creates a tree representation. Now this tree representation is your content representation of XML. That is at this point you have a root element. All of these things are nodes. Now this node will actually have a name called CEPLA. But when I am writing code I have to parse or if I have to do something with this, I have to say that okay get this node, get node by name or get children node. I am always talking in terms of XML nodes rather than talking in terms of actual objects. Now when we will take a look at Jackspeed, we will see how this thing actually comes into play. Now this is a similar kind of code which shows that how you can use DOM parser. You create a factory, you get a document builder instance and you do a document builder dot parse. So builder dot parse and give a file or a stream or something like that or URL, it is automatically fetch and parse it and create a DOM tree for you. Advantages of using DOM tree is you can go back and forth. You can traverse the tree and you can go back and forth. You can edit. It provides you with crude operations that is create, read, update and delete. Whereas SACS parser is read only parser. Now there is one more interesting thing you can do here. So this is an example of how you can use SACS parser and create a DOM model from it. So you just have to write your own content handler. So whenever there is an element or an attribute, it creates a particular node and stick it into your tree data structure. That is what you can do with this. So using SACS parser to create DOM. This has advantages. If you follow this method to create DOM models, the biggest advantage is error handling. So error handling while creation. Right now DOM will directly crash. If you use some standard DOM implementations, they will directly crash. But if you do it this way, if you write content handlers and if you use SACS parser, you can define your own error handlers which can maybe try to correct or maybe progressively create some intermediate tree or it can flash a user-friendly error message. You can also validate using both parsers. They all said validating is an API which you can use. You can make them namespace aware by using set namespace aware. So if it is namespace aware, it will do strict validation and it will check for namespace consistences as well. Now this was about parsing in JAXP and there is another set of functionalities which is provided that is called transformation. Basically, XML can be used in two ways. First one is for human interaction. So I am getting an XML and I want to put it in a more user-friendly way. Not all humans are aware of XML syntax. So say for example if I am going to send this CEP lab description to some professor, he would like to have it in a nice HTML format or a PDF format in a tabular way rather than getting an XML. So at that time I would like to transform this XML into an HTML or PDF or some human-readable format and the transformation can happen automatically. If I define the rules properly, the transformers can take these rules. It can take source XML file and it can generate an output file which is directed by the transformation rules which you have specified. So the other way around is you use XML for machine-to-machine interaction as well. I mean the component-to-component interaction like message-oriented middle-wares where one component is sending an XML document which is conformant to some schema. Other component is expecting an XML document which is conformant to same schema, not same but similar but an equivalent schema. The schemas are not exactly same but they are equivalent schemas. So now you may want to translate this source XML into a target XML file. So accessibility can be used for that purpose as well. This is also known as TRAX that is transformation API for XML. Now this is how it looks like. You have a source file that is preferably an XML or a DOM tree. You have transformation instructions. They are simple transformation instructions in which you identify nodes using XPath and you say what you want to translate to. And then this transformation factory is again a factory implementation so you can use any transformer, any transformer, vendor-specific transformers you can plug in. And this is the result you get after applying this transformation. So transformer is the piece of code that will do transformations for you. This is factory method we have already covered. This source object can be a DOM tree, a FACTS XML reader or an IO stream. That is you can create an IO stream from a file or a URL and you can give it to a transformer. So these different source packages are, different type of sources are implemented in different packages like transform.dom, handles cases where you have source file as a DOM tree and so on and so forth. Now if you have a DOM tree and you want to write an XML file out of it, the easiest way to do it is use an empty transformation sheet and run a transformer over it. It will take the DOM tree and it will spit out an XML file for you. That is one of the use cases. Now the JDOM is an implementation of W3C DOM and DOM4J is another version. Jaxby is completely different thing. It is not similar to DOM. It is used to create application models as we have discussed. Now we will take a look at Jaxby. Jaxby provides a set of API tools and framework to automate mapping between XMLs, documents and JavaScripts. That is you have, first thing you have is an XML schema. Use Jaxby APIs and tools to create equivalent classes. Now if I clarified by using an example, this XML file, you have a root element called CEP lab. Then you have certain other elements like group. Each group has a group ID. Now if you, and you might have a schema associated with it obviously, I have not specified the schema here, but you have a schema that describes the structure. Now you give the schema to Jaxby. It will create classes for you. And that model will be an object model. So CEP lab will have a collection of groups. So it is just like normal Java data model. CEP lab will have a collection of group. Each group will have an element called group ID which is of string type maybe. And you will have getter, setter and accessor methods to access this model. So the way you will access a group ID one is CEP lab.getgroups.index at zero because this is the first one. So index at zero dot say get group ID. So getter, setter methods instead of handling XML. So if I don't know XML, I can still work with object model. And Jaxby will do this translation for me. So in string and date, these are normal properties and they are directly translated as fields in class. And whenever you have a complex data type, say you have complex type in your SSDs, it will directly translate to a class. So this is the way Jaxby works. So here this slide illustrates its fact. If I can call getPerson method, why should I call getAttributes? That is more relevant to my application model rather than saying that go and fetch an attribute which is a name called person. So this illustrates issues with DOMJax model. You have to again parse a tree or you have to write event handlers which are, if I am not an XML programmer, I might not want to do that. So Jaxby automates this XML to Java binding. It makes your life simple. This is a sample architecture. You give an XML schema. It creates a set of interfaces and classes. These classes are default implementation of these interfaces. If you want to change it, you can change this implementation by yourself. There is a facility to customize this binding. You can redirect your XML schema files to some existing classes as well. So if you have already created a data model and you want to bind your XML, your data model that is generated by binding that XML schema, you can map those things and you can, by using binding declarations. Now once you have done that and you have this Jaxby implementations APIs here, you give a source XML as an input and it will generate. So this application code sits on top of these four things here. So once you give this XML document as an input, it will automatically create objects for you. These objects can directly be used in your application code. It will generate Java source files for you, which you can compile and use it. And it can also, once you change that, say if you have imported some CEP lab object in your application and you want to change some group ID to say 5, it will, if you can use Marcelling APIs to again write that thing back to your XML document. So that way you can modify your XML document using Jaxby APIs. So unmarcelling is the process where you take an XML document and create an object model from it. Validation, you can validate this thing. So once you have created your classes, you can validate your XML documents against a schema. Those APIs are also provided with Jaxby. You can, when you have content tree in your application and you are trying to modify that content tree, you can write that thing back to an XML document. This process is called Marcelling. This is a complete life cycle. So schema has to be compiled into derived classes. XML document is unmarceled to objects and then again marshaled back to XML documents. Schema can be used to validate objects and these objects are the instances of the derived classes. These are simple binding life cycle of Jaxby. These are simple rules which are followed by binding compiler that is by default. Target namespaces are converted into packages. They are mapped to packages, not exactly converted into packages, but they are mapped to some package. Element and complex types are transferred into classes. Whereas simple types and attributes are converted as fields in the classes or properties in the classes. This unmarceled interface governs the process of deserializing XML data into Java content tree. This is the simple code snippet shows how to use unmarceled. You have to create a Jaxby context object. It takes parameter as a package name. So package name is a parameter essentially. So com.ecma.foo is a package where I have all source files which are generated. So once I give a schema, it generates classes into some package. Now that package name comes as a parameter to the Jaxby context. So now when I create unmarceled and I call it over an XML document, the objects will be created based on the classes which are present in that package. So first I create a Jaxby context. From that context, I create an unmarceled. And to this unmarceled, I give my XML file as an input. Again, you can give a URL, you can give an IO stream or you can give a file to this package. A modular is a similar thing. You can spit out XML from application content tree which is in your model. So here process is again same. You have to get Jaxby context. Here in this example, it is first unmarcelling at XML and then writing it back to different XML again. So I am taking object from foo.xml and I am marcelling it into some Nosferatu.xml. So this was all about Jaxby and Jaxby in this session. These are some references which you can go and take a more detailed look into these APIs. Thank you.