 First session we are going to continue our discussion on HTML and we are going to discuss extensible markup language or XML. We shall see the main differences between HTML and XML and identify certain basic features of XML. We will not be using XML explicitly in any one of our assignments or something but it is important for you to understand what XML is and what it is used for. After SQL, XML is emerging as the largest and the most important tool for exchanging information between disparate systems. So XML is therefore extremely important in building information systems going way forward. Okay, so we will very briefly review HTML, understand the difference between rendering versus meaning and then look at the XML basics to some extent. You will recall our discussion on hypertext markup language. We said that the markup language is essentially used by proof editors and it describes fonts, margins, bold, italics, underline, whatever, whatever. Essentially features are how a document looks. So it is about looks or rendering. HTML has predefined tags which describe the appearance. Appearance means rendering basically and we also remember that HTML document can embed programs, programs which are written actually in any programming language but we saw specifically the JavaScript notion where we could embed the JavaScript as a part of the HTML document itself and the browser would act as interpreter of not just the HTML for rendering but also can execute the JavaScript. If we embed a Java program then we require a separate Java interpreter to interpret that but essentially then a document is no more a plain container of information. It can also contain code which can be executed while the document is being rendered. That is the advantage of HTML. XML is an extensible markup language. The same as between HTML and XML is only in that both are called markup languages. HTML is hypertext markup language, XML is extensible markup language but the extensibility is more important than just what HTML stands for, namely rendering. XML is actually a meta language to represent data or a piece of information. This is different from rendering. So this is in fact the most fundamental difference between HTML and XML. The purpose of HTML is to indicate how a document will look like when you render it on screen. The purpose of XML is to define what the document contains. The markup of XML elements is again done through tags. So you have tag followed by data, followed by slash tag. This is the standard structure for any markup language. Now the difference is the tags are user defined here. In HTML, they are predefined tags. So when you say slash b means bold, whatever, whatever. Here the tags are user defined. They are case sensitive and they are used to denote meta data. So data about data is what is represented in XML, including of course data. Here is a website which you might want to note down separately so that you don't have to refer to the slides. This is www.w3schools.com. I think I mentioned w3c. It is actually a committee which defines the standards for all activities on World Wide Web. So www.wcommittee. And there is a w3schools which gives you a whole lot of tutorial and other information on related things. So most of the material are picked up from there. So you have easily accessible resource on the web. Like HTML, you also refer to XML document, like you have an HTML document. Each XML document has a root element and has several other elements, some of which could be nested. So these other elements could be characterized as either simple elements or mixed elements. A mixed element in turn can support other elements and this support can be sequential or hierarchical. This could all be confusing because it is all terminology. So we will go through an example to understand what exactly we mean, what is the root element, what are the other elements, how other elements can be simple or mixed and how a mixed element can be having other elements which are sequentially defined or hierarchically defined. Please note one thing once again. XML document is not about how the document will be seen when you render it on the screen. XML is about what the document contains. So it is about semantics of data. Here is an example. So consider that I am describing a note document. You know note, you write a small note to someone. So will you meet me tomorrow? So typically how will you write a note? This would be a typical note that you and I could write. As a teacher I would write this note and circulate this note to my T.S. Now you and I all can make sense out of this note. So it says it's a reminder note. Note is a generic word but type of the note is defined. Then to whom? So it is addressed to someone from whom? Who is addressing it? And the matter. Now you and I as human beings can easily identify components of this note as a structure so that this is a note whose type is defined here. The note is meant for someone. The note is written by someone. Or there is a matter of the note. Does it matter how do I write this particular thing in terms of the context of the individual metadata that describes this note? For example, suppose I wrote the same note differently but you all agree that in terms of the contents the note represents the same information? You and I understand it. The computer doesn't. For the computer these are strengths. For the computer this appearance is completely different from this appearance. You agree? So if I were to write an HTML document for this and HTML document for this these two HTML documents will be different. They will be treated differently. And the computer may display on the screen either these or this. But none of these two HTML documents will make sense to any other automatic computer program to tell that program that look I am a note that this note contains from field to field. This note contains some information. This will not be there. This will not be understood. HTML permits you to describe the metadata about this note as a structure and can interpret that metadata and the associated data. So this is what we shall see in a moment here. Let's get back to our slide. To describe this note as a document first of all we define something called a root element. Remember what we said about XML? It has a root element and it has other elements. So the root element is what we call the note. It contains four child elements which are the four child elements to form, heading and body. Now we are describing. I am describing. So I want to tell a machine that what I am going to send is the metadata about a thing called note that this note has four child elements. One is a to field, from field, heading field and body field. And each element has simple data type. This is what I want to tell you. What is that data? So two could be TAs for three, six, four. From could be Dr. D. B. Futter. Please note that from is the metadata. The actual value from Dr. D. B. Futter is the data. So data and metadata both will be put together in an acceptable document. Here is an example. Note. So notice that this is the XML document itself. Note is the XML document. Then it is heading. What does it say? Reminder and heading. So the note has a heading which is reminder. So it is a reminder note. To Pratibha. Pratibha is my wife's name. So to whom I am writing this note? To my wife, Pratibha. From Deepak, that is me. Body. Pune. Visit this weekend. Slash boy. And slash note. Can you make sense out of this? Semantically. It's a reminder note. Written by Deepak to Pratibha. Telling her that this weekend we are visiting Pune. Now all of us can make sense out of this. But the beauty is that now a machine can make sense out of this. Because to the machine, we are not giving some simple text. Arbitrally colored, bold, italics, etc. Such that the computer cannot ever understand where what content is going to be displayed. But here we are trying to tell it that the document that I am giving you has a structure. It has some metadata associated it. And we are saying the structure is that this document is called a note document. It starts here, it ends here. This has simple four components. This is a heading component whose value is reminder. It's a two component whose value is Pratibha. From component whose value is Deepak. And body component whose value is this string. Suppose an application reads the XML document. Now this is the beauty of the interpreter of XML versus interpreter of Pune HTML. A browser can interpret HTML and render it as bold, italics, superscript, subscript, whatever color they set. And XML interpreter actually extracts these elements. So it understands that this document has four elements. So it extracts an element called to, element called from, element called heading and element called body. And let's say it produces the following output. Now look at this output. It says reminder note to Pratibha from Deepak. Pune visit this weekend. Please note that now we are saying that there is a program which will interpret the contents in terms of metadata and data and will render the contents as well. So the rendering part is similar to HTML interpretation. Now this may not be the default behavior of any XML. I have just shown it as red and blue color. You can check what is the rendering behavior by default. The beauty is that you can also put HTML tags. If you want additional rendering to be done. But the important thing here is that I can give a metadata about the elements and each metadata can be extracted separately. Now imagine that I modify the document. What does this document say? Note, date, 31-3-2005 slash date from Deepak slash from. To Pratibha slash to. Reminder slash heading. Body Pune visit this weekend slash body slash not. Notice that this document is different from the old document. The sequence is completely different. If it were a HTML document then the lines would have appeared in the other in which they appear in this document. In the old document they would have appeared in the other in which they were written. Here they would appear in the way in which they are written. If XML was about rendering. The fact that XML is not about rendering. The fact that it is an XML document means that the application which we talked about a hypothetical application. Remember I said application has two parts. One part which extracts the elements and the other part which renders the elements. Now imagine I give this document to the same application. This is now a different document. But see what could happen and you can guess what will happen. The first part will extract elements from this. That part now sees that it has five elements instead of four. Which are the five elements? Date from to heading and body. And it gives these five element values to the rendering program. The rendering program understands only four values. It does not understand date. The beauty is that the rendering program will negate this date part and will still print exactly the same thing using the four elements which have been picked out because the program recognizes two elements from element, heading element and body element. So if the same application is used again, will it crash? No. It will still correctly extract the required fields and produce exactly the same output ignoring extra information. We have seen all the importance. I am now able to convey to some application program the metadata along with data in a recognizable form. The order in which the metadata is given and the data is given is not important. Metadata and data should be associated. For example, if you go back to this document, from DPEG slash from is a single unit. From is an element slash from handset and what is contained in between is the data. So imagine a roll number, okay, then 98200175 slash roll number. What are you doing? You are actually defining an attribute and it's value. Name slash name. CPR slash CPR. Hostel slash hostel. You will get no point. It is now possible to describe a student's recall in terms of the metadata and the actual data for that student. Now, if somebody's name is missing, doesn't matter the remaining things can still be understood by the interpreter. If I give some more hobbies, the application will say I don't understand what hobbies is. It depends upon how I have written that application. What it understands about rendering or filling. But the fact is that I have now the ability to supply the metadata and data in a recognizable manner, which the program can recognize. They are fundamentally different from an HTML or any text which is rendered. Notice that if this were an HTML and if I just change the order, the order in which things will appear will be different company. And that is because HTML is only about rendering. Whereas XML is about semantics of that. So this is the point I wanted to. As a result, even if I modify the document and I add let's say a data element, I exchange from and to, I put hidden somewhere down below, it still does not matter because the same application. We remember all that we have assumed about that application is that that application has two components. One component which understands the XML document and extracts element from it. The other component which renders the extracted elements. The rendering is fixed as shown here. Normally that it will always render the reminder at the top, the type of note at the top, to next, from next and bury it. This is the way in which it will end. The first part will extract these four elements. In case I give the second document, it will now extract five elements. The rendering part understands only four and it understands the order in which they have to render. And that is how we say that it will still correctly extract the required fields, produce exactly the same output ignoring the extra information. The word extensibility comes from the fact that if I use the same application where I have an extended document with more elements, the same application will still work without any problem. For example, in my Kubo application, I add three more fields later. The XML creator, whenever translating a particular row, will add those three fields and their values also. But even if I add an old application, the old application is still capable of interpreting the elements which it understood earlier and you do whatever. Either it can render or it can insert into database or it can fill it up into something else and further exchange that information with someone else. XML therefore has become now the standard of exchanging information across disparate applications. And since in real world, while new applications may be developed using Oracle as the back end or MySQL as the back end, there could be applications which are still Kubo applications, there could be C programming applications, there could be two database applications where the schemas are different into applications. Different meaning? All things is different. Let's say row number name is stored somewhere else but CPI is stored in some other table. It's possible, right? After the schemas of two applications could be different. But if you put every data into an XML form, send it and if you interpret every data from XML, it is still possible to create an interface without changing your original application. And that makes XML extremely interesting and useful. Today in fact, there is hardly any mechanism which exchanges data without XML format. So that's the power of XML and XML has reached these status rather quickly. So let's see a little bit more on XML elements. We now go into the details of how an XML document could be. First of all, all elements can have sub-elements. These are called child elements. So the father element, the parent element and child element. These sub-elements must be correctly nested within their parent element. For example, if I have root and below that root I have a child, below that child I have a sub-child, then sub-child must end before the child element ends. After the child element ends, they root element. This is common sense. Any nesting, I mean parenthesis, you must match opening and closing parenthesis. There is something as simple as that. Here is an XML elements example. So let's look at this example. First line says book. So what does it mean? This is an XML document called book document. Just like we have a root document, this is a book document. The next line, can you interpret this? This document has been created to be self-explanatory. So find out if you can't understand something. This is a command. What does it mean? That if I wrote a less than sign followed by an exclamation mark and then wrote anything and put a greater than sign, then this does not indicate any tag or an element. It indicates a command. So any command with exclamation mark is ignored by the acceptable interpreter. What does this element say? Title. So this is an element now. Title is an element. My first XML slash title. So title ends. Title is an element of a thing called book. This is one of the elements. Next, proud ID. Proud ID is product ID. This is another element. But wait, this is a very familiar element. It says proud ID, 1, 4, 3, media, paper, and then there is no information here and slash proud suddenly. So this looks like an element. But unlike this element called title slash title in between which I have given actually some string title, here there does not seem to be a value for this element. It's an empty value thing. However, there are qualifiers within the definition of that element itself. So while the element is called proud, I'm qualified yet by saying that this proud has an ID which is a string 1, 4, 3, and the media for this production is paper. So this is a peculiar feature where XML permits you to qualify the elements. Consider next chapter, introduction to XML. Ultimately I would have expected slash chapter because element begins element ends with the value. But here it says para. What does it mean? Para is a child element. So para, what is HTML? Slash para. That means one paragraph ends. Alumn para. So another para element. What is XML? Slash para. Slash chapter. Now you understand that an element called chapter started, it had two child elements and then that element ends here. Here another element called chapter. XML syntax. Para. Elements must have a closing tag. Slash para. Elements must be properly nested. Slash para. Slash chapter. Slash book. You will understand this XML document easily. It's self-explanatory. What is important to note is that it is actually conveying metadata and data together. And it is explaining how the elements can be nested internally and so on. No. There is no predefined tag in XML. That's why it's called extensible. The question was are these predefined tags? No. I am defining these tags. You can define it in any way. You can define it in any way. But if I don't know the way you define, then my interpretation will be a problem. Okay. That is a good question. So how do I define tags? For example, the case that we consider, a program which stores students' records. Now there I am calling a number, name of student. Whereas the database says a soul. Okay. S name. Now whether that element and this element is same, these two applications have to understand. What XML permits you to do is it can separate out these elements, give the metadata and data and send it across. In short, what XML is permitting to do is to send schema and data together for each record. Ordinarily how do you envisage a database table? It has one schema and then there are ways of data. Now imagine the table contains only one row. Then the table will be fully described if I describe the metadata on the row and I further separate it out. The metadata is one element, value of element. So value of element, value of element, value of element. That is what XML is. Whether the name of that element makes sense to some other application or not depends upon the common understanding between these two applications. And that is why as we shall see later, there is a notion of XML schema. So I can send the XML schema which is understood by interpreting program there and then wherever in the context of that XML schema I send some elements. It can make sense of what is that XML schema the other way. So that is how you can actually connect the two different applications easy. In the context of the previous simple document that I had shown I am explaining what this document contains. So first of all each element may have a content and it can have different content types. So an element can have an element content, mixed content, simple content or empty content. And element can also have attributes. So very funny. Element we thought element has a name and element has a value. Well number 940145 to Pativa from Deepak. So element name and value. But now we are understanding that there can be many other things. So each element can have a content which is an element content which itself is a mixed content which could be a simple content or empty content, no content. Additionally an element can also have attributes. We are now interpreting that big document. I am sorry I don't have a document here. But let me just go back here. Go back to the previous to previous slide. Just sort of try to keep this in mind. So it has book, title, pod with some attributes, chapter slash paras slash paras slash paras slash chapter. Again chapter paras slash chapter book. This is the structure. Let's look at this structure in terms of interpretation in slide number 14. The book which is the XML document itself has element contents because the book as an element contains elements. So its element is element content. Chapter is another element. And it has mixed content because it contains both text and other elements. There is a chapter name and there are other elements also. Power has simple content or text content because it contains only text. Power has empty content. It has no content value. But because the power itself carries no information. However, power has attributes. So external elements can have attributes in the start tag just like HTML. Attributes are used to provide additional information about elements. So course type, elective, it640 slash course. It640 is the name of the course. Course slash course is the element definition. But there is an attribute for this course. It says type elective. So that means it's an elective, not a core course. So this is the kind of information I can provide using XML tags. The pod element had attributes as we saw. What are the attributes? Prod id equal to 143, media equal to paper. The attribute named id has the value 143. The attribute named media has the value paper. So not only I am prescribing attributes, I am prescribing the values also. Now this element may have 10 attributes, may have 3 attributes. But as long as I have sort of a common understanding of the kinds of attributes that I can have, a particular XML document containing only 3 attributes, those 3 will be correctly interpreted, independent of the order. If another has given 5 attributes, out of which 3 are same, 3 will still be interpreted the same way. The other 2 may be interpreted the way their interpretation is made. XML permits us to implement the semantics of beta data and the associated data together. So whenever information is not part of the data, I will generally try to provide it through attributes. For example, here is this. File type equal to rtf, tutorial.rtf slash file. Here is an element called file, file slash file. In between I have an attribute called rtf. And this is the name of the file, tutorial.rtf. Now name of the file is ordinarily sufficient. Why am I introducing this attribute? The attribute is important because maybe my word processing software would like to know what type of file it is so that it will open it appropriately. So it could be a doc file. So type could be doc, type could be rtf, type could be pdf. This semantic is very vital because then the interpreter on the other side will take this file name, will open that file, but will open it using the right program depending upon the type of file that I have given. Of course you have to build a lot of intelligence on the application on the other side. But that's the whole point. These days you make your applications what you may call XML aware. That means the application should be able to receive an XML document, interpret it meaningfully and do whatever activities it is supposed to do for processing that information. Where we define the path of the file? Well, I mean this is an example. But this example alone will not be sufficient. For example, if this is what I am communicating, then there is a default location that the other fellow must understand. So for example, my application might be writing a file in a pre-assigned directory and the other fellow is supposed to open that directory. If I don't like that, then I may send the entire path. So that depends upon what is the understanding between me and it's almost you understand the protocol that we discussed in the communication protocol. The protocol between the two parties must be established for communicating anything. What XML is doing is it is permitting part of the protocol to be defined on the fly. There still has to be a major understanding between the two parties to interpret values correctly said by oneself. So you are very right. This is just an example to illustrate what XML is capable of. This is a limited example. But you are very right that there may be much more information that needs to be conveyed and so on. So this is the way you give the attribute values here. For example, even if there is this broadest production and let's say I am a publisher, I produce books. Publishing said 20 books. Now each book might have a name or some such thing which will give value. But all these books may be paper books. A few books may be e-books. So I will say media equal to CD. So the attribute is a generic thing which qualifies many data values, many actual values together. That's the problem. Suppose I am sending a message element. What is the message? If salary is less than 1000, then this is the message. So I am giving something here somewhere. But one message I want to give is if salary is less than 1000, then whatever, fire that fellow or hire that fellow, whatever, some message is there. The point being illustrated here is not about salary. Point being illustrated here is if the message itself contains a less than sign, then you have a problem because less than sign is typically the beginning of the young type or beginning of a sub-child target and so on. So you have some standardization here. You have to replace the less than character with what is known as an entity reference. So AND LT is representative of less than sign. What will be the representative greater than sign? And GT. This is common sense. It is like escape characters that you use in any string representation. So AND LT, AND GT, AND AMP. That is ampersand itself you want to represent. AND APOS means apostrophe. AND QUOTE means double apostrophe or quotation mark. So these are again standard, what should I say, nomenclature that is used to represent these particular things. Entity references always start with an AND character and end with a semicolon character. Notice this. AND LT semicolon. AND GT semicolon. Now these are peculiar ways that the language designers are almost arbitrarily selected. Why semicolon? Why not colon? Why not comma? Why not full stop? They said semicolon is something which will not occur ordinarily in such a string. The next question that I will ask is, suppose in my string value and AMP itself is a value, then how do I represent? It's a chicken in a kind of story. So you choose such strings such that they will not occur because if you hypothetically suppose I ask you a question in an exam, how will you represent AND a pass? Then your current answer should be which idiot in this world will you like to contain AND a pass as a possibly meaningful string in any data for information systems? Answer ends, right? Because that is the basic design philosophy of doing this. Of course if you do find an idiot, you have to tell him that his data will not be handled by XML. But then you will say that theoretically XML is incomplete. Since XML is designed by scientists and scientists don't like to do incomplete work, they have completed the task by saying, you have such fancy requirement of sending strings which are AND and this and that. So I create a separate section called seedata. If your text contains a lot of less than symbol and characters, where will you contain this? It is not an ordinary data for information system. But suppose you are transporting a code itself, C-program code, Coral-program code, SQL code, etc. That code itself is your data. Now how will you port that data because every time there is a less than symbol or greater than symbol or AND, you have a problem. So what the examiner says is, I will give you a separate section called seedata section. Everything inside a seedata section is ignored by the parcel. That means XML will not try to interpret contents of a seedata section. It will understand that seedata section is typically used to create large code lines of code, C-program, SQL-program. So it is not XML's job to understand. XML's job is to take the seedata section and deliver it to a C compiler or a Cobalt compiler or whatever. So XML parser, although the application which understands XML, extracts, elements, they will say, oh, seedata section, I will not look at it. So it is clear now how you could put lots of less than, greater than symbols, etc. However, how do you make the seedata section itself appear to begin uniquely or appear to end uniquely? That itself should not be confused with something else. So very special symbolism of starting the seedata and ending the seedata. A seedata section starts with less than symbol, command, opening bracket, seedata, another opening bracket. If you get this chain, then this is the indicator that what follows is a seedata section. And then it ends with closing bracket, closing bracket, greater than sign. So look at this. Less than sign. This typically means what? Beginning of an element, right? After that, exclamation mark means what? Command. So from this, everything will be ignored by the parser actually. Whether I like seedata or not, doesn't matter. But there is an opening bracket, seedata, another opening bracket, and ends with closing bracket, closing bracket, and greater than sign. So what is the purpose of doing all of this? You see, to the XML parser, you want to cheat by saying what I am writing is command. But remember there is somebody else there waiting for this contents, which is actually a seed program or a SQL program or something. Now that fellow should be able to understand this is a program. If just a command is ignored, like all commands are ignored, then this way also get ignored. You don't want XML to ignore it completely. You want XML not to look at the data. But you want XML to understand that there is a seedata section, picnic contents, and give it quietly to another fellow. So when you say opening bracket seedata, the XML parser will understand that seedata section is starting. There is another opening bracket. After this, whatever you write is the program code, which will be given separately. The first opening bracket closes this. The second closing bracket closes this. I leave it to you to wonder why opening bracket seedata, opening bracket, and then the seedata section, and then this closing bracket, closing bracket data. You figure that out. But people have done a lot of work to appropriately write this such that you can always write code in any programming language as a part of seedata section and transport it across different systems. Here is an example. People generally know seed programs. Some people will know seed programs. Here is a seed program code actually. Function, match negative, a, b. Opening bracket. If a less than 0 and a equals b, then return 1, s returns 0, n. So notice what I have written. Script and script. This is the element actually. Within the script and script, I have a seedata section. So I have written the starting seedata section here. I have written the entire code and I have then written this. So this means this whole thing is a script. In fact, I can give an attribute to the script by saying programming language c or whatever. So that attribute could be interpreted by the other system to say what I am getting is a c programming code. Compile it and run it whatever purpose I want. Somebody will ask whether the document should start with something describing that this is an XML. Yes, indeed that is the case. In fact, the first line usually informs you about what we call the character encoding. Whether character encoding is ASCII, ASCII 7-bit, ASCII 8-bit, ISO standard, Japanese, Hebrew character, whatever. So this character encoding can be described by the first line where you usually have a question mark at the beginning. Remember, exclamation marks means command. Question mark is also something under the command. There is not to be ignored. It is to be interpreted and understood by the XML parser for other purposes. The other purposes are indicated here. This one says version. Just like we had IPv4, IPv6, XML itself has versions. Different versions might have different interpretations. So you are permitted to give version. And encoding can be prescribed. So ISO 8-59-1 is one type of character encoding. This is standard and this is interpreted by the XML parser to make sense out of the values that are ultimately given as textings and so on. Now here is the problem. As somebody I think here has the question, are these tags standard? And I said no, of course not. They are user defined tags. But if they are user defined tags, then there is a confusion that is possible. For example, I have written one document in which I have used some name of a tag. Let us say, whatever, XYZ to mean something. You might have written another XML document where you have used the same tag XYZ to mean something else. Now there could be a lot of problem. How do you avoid this confusion? So since two different documents can use the same name describing two different types of elements, I must have a mechanism of avoiding this confusion. Here is one example. Table is an XML document. TR, TD, apples slash TD, TD, bananas slash TD, slash TR. Can you recognize this TD, TD, TR kind of things? What tags are these? These are HTML tags, right? So I am writing a table in the sense a table drawn here. The table has two rows. First row is apples, second row is bananas. And then end of table. This is one example. Consider this table. Name, length, table, slash, name. Now I am actually describing a piece of furniture. So this is a document called table. There is an element called name of the table which is length stable. There is an element called width. So this is three feet. There is an element called length which is six feet slash table. Now I have a problem here. This is also a table which has apples and bananas. This is also a table which has length stable three feet by six feet. The name is stable. You have used table for apples and bananas. I have used it for length table. Isn't there a confusion? How do you avoid that confusion? So what we may do is we may use a prefix. So instead of saying table, I can say F colon table. F now becomes my prefix. F colon table. F colon name, length table, slash F colon name. Okay. F colon width, three slash F colon width. So what happens now? Although I have used the word table, this qualifier prefix makes it unique. Provided of course, I use F and you use G. If you also use F, then there is more confusion. Same confusion in fact. F colon table that you have as apples and bananas, F colon table that I have as length stable. In short, I like this notion of uniquely specifying an element name by putting a prefix. But I want to ensure that the prefix is unique. Prefix is different for you. Prefix is different for me. Prefix is different for somebody else. What is the easiest way of ensuring that the prefix is different? Yes. But the number, your number is 35. And like a man-man, I also think of 35. We are sunk. Sorry? Time numbers. But time numbers can be thought of the same time number by two different individuals. So in the internet world, what is likely to be unique for you and unique for me? IP address. IP address is one. Something better. Your website. My website is www.cac.itb.ac.in slash dvp. That is my website. In your CAC department process, I was website is slash nls. His website is different from my website. In fact, in all likelihood, forget an individual. Suppose a group of programmers in computer science department are preparing an XML document. Then at least amongst them, they should have an understanding of not using the same name. So if they use a common qualifier, then it will do for that group. Another group in IIT Kampur CS department which might think of the same as my name, another common qualifier. It works out that your website or URL works as an ideal qualifier. An ideal prefix. And here is the namespace attribute which is placed in the start tag of an element which has this syntax. XML ns. Let's see what XML ns means. XML ns. ns means namespace. So actually what I am trying to do is I am trying to create a unique prefix for me. That means I have a space for all the names that I give and this space is mine. You have a space for all the names that you give. It may still have the same names but it should be treated differently. So you should have a namespace. I should have a namespace. XML permits you to define this notion of a namespace by saying xmlx name space slash prefix equal to whatever namespace. For example, xmlns.f equal to http.iitb.ac.in-furl Now this is, so namespace is nothing but a namespace prefix and this prefix could be made unique whether an individual or a group of people who choose names in some collaborative way. So you agree that now there cannot be any confusion between the names that me or my group chooses and the names that you and your group may choose. You may use the same name to mean something else but it will be understood by the interpreter because the interpreter will take your thing as your namespace followed by whatever and my thing as my namespace followed by whatever. And the URL does not mean anything more. It is merely a name. It is just to give a unique name. I am giving a URL. It may not be a URL if I, if somebody is suggesting, it is not just the prime number but let's say that like a postal address you try to create a unique address. So every namespace created by IIT Bombay could say IIT Bombay department of computer science and engineering third floor school of IIT building. Now that means everybody using those names here is come. But instead of defining such arbitrary prefixes it is best to use URL as a prefix. So URL is a simplified way of using a namespace. The namespaces are important because the world has to exchange XML documents written by absolutely every Tom Dickens Huy and he has still to make sense out of that. So W3C that great committee has spent a lot of time in nomenclature of these namespaces. So there is a W3C namespace specification which states that the namespace itself should be a uniform resource identifier, URL. When a namespace is defined in the start tag of an element all child elements in the same prefix are associated in the same namespace automatically. You don't have to give prefixes later. The address used to identify the namespace is not used by the parser to look up information. So the address URL that I give has no other connotation. It will not be used to do something meaningful. It is just a tag, a unique tag. The only purpose is to give the namespace a unique name. However, very often companies use the namespace as a pointer to a real web page containing information about the namespace. Since the namespace can be a URL then a company may actually give that URL to meaningfully point to something about some more description about that namespace. Maybe what you call style sheets or whatever. So I would request you to read more about this. It's also very interesting information, extremely well organized. They have thought of practically every possibility of having uniqueness yet having simplicity of interpretation and ensuring that metadata and data can be composed, can travel and can be interpreted to convert into whatever form. You can define a default namespace for an element so that you don't have to use prefixes for all the child elements. This is a simple extension, not very meaningful. So here is one example. Xmlns, http colon, www, etc., etc. Name, lunch table, width 3, length 6, table. Now here, this table will always be associated with the furniture table because of this namespace. On the other hand, the other one could have name with some other thing and it could have apples, bananas or whatever that you wanted. So the table that we have created is an empty table. No, no, no, this is not a table. Don't confuse this with the database table. This is not a database table. This is a dining table. It's being described. Now here is the notion of an XML schema. XML namespaces and XML schema. Forget the DTD reference and all those of you are interested, can look it up. But essentially it is about defining the legal building blocks of the XML document. So, XML schema actually is used to define what that XML following XML document will contain. Just like database schema defines what data will come in the tables, XML schema defines what kind of XML document will come later. So here is a sort of content for XML schema. It defines elements that can appear in a document, attributes that can appear in a document, number and order of child elements, data types for elements and attributes, defaulted fixed values for elements and attributes. Now you can see XML is becoming powerful. Remember we first said that suppose some XML documents have only three elements, other has five elements, something else has eight elements. If I can agree on an XML schema, then I can define an XML schema which will describe said 10 elements. Which are the 10 elements, how many child elements each one has, in what order they will come. All of that I define as an XML schema. And I exchange that XML schema with you. Now anything, any document altering to that schema is now commonly understood between you and me. Because that XML schema says that this will have whatever chart, paragraph, whatever, whatever, or this type and this will have these attributes. So XML schema is a brilliant way of exchanging commonly understandable information on both sides of the parallel. And as long as I keep sending documents which pertain to that XML schema, the XML document itself can be passed. Some document may have fewer elements, some will have all elements, some will have attributes, some will not have attributes. Here is an XML schema. Okay, question mark, XML version 1.0. XSD colon schema. So you remember, you know, data definition language in SQL. So XSD means it is XML schema definition. Okay, colon schema. There is a known space. HSTP colon, there is a spelling mistake here. Should be HTTP, okay, whatever. W3D.org slash 2000 slash 10 slash XML schema. This again, as I said, it's only a URL. It's just giving a unique name space. XSD, element name, card info, type is complex type. XSD, sequence. XSD, element name, name, type, XSD string. XSD, element name, title, type, XSD string, phone, string, email string. XSD, end sequence, XSD, end element, XSD, end schema. I would like you to make sense out of this. First of all, there is a given element name, card info. So what kind of card information it has? A visiting card. So I am describing a schema for visiting card. And I am saying that this visiting card will contain, the type of visiting card is, it is card information and card information is complex information. And what are the complexities? That card info itself will have a sequence of four elements. The first in the sequence is name and the value is string. So name here could be DB, phyton, okay. The next would be title, type is against string. So it's a Subramanaka Nietzsche professor or lab superintendent or whatever, administrative officer, whatever. Phone, type is against string. So 25767747, somebody else's number could be different. Okay. And then email, DBP at IITB, whatever, somebody else's number. Observe that, once I describe this XML schema, then the data for 2,000 cards like this can be given. And since it is known that every card that comes now at most these four elements in this order, the interpretation will become very simple. The XML document containing the card itself, we'll see how the notion name, value slash name, title, value slash title. Phone, value slash four. You understand that? That will be the XML document. But its meaning is now clear. And if this XML schema is understood by both parties, you can see that the data exchange will be very smooth. And frequently, when XML documents are exchanged, not as regular documents which contain C data, section C code or something, but documents which actually contain data from database, then they usually be associated with a XML schema for this part. Here is an example, XML document, put them into that schema. Card in four. Name, DBP, father, slash name. Title, chair, professor, IITB, slash title. Phone, plus 911, this, slash phone. DBP dot IIT, syntax error. It should be at IIT dot IIT, whatever. Slash email. Slash card in four. Now this is an XML document describing contents of one card. I can send exactly 2000 such elements in a card. And 2000 card data is transferred automatically. And as you are saying that there should be no confusion that what this name means, title means, phone means. The XML schema of the previous thing defines all of this. Can we now relate to the basic notion of database, schema and data? It is something very similar here and very brilliantly used to convert everything into standard string forms and exchange data across systems. We have not discussed this, but it was unbelievably difficult to make different application systems in the world interact with each other. There used to be special applications called EDI, Electronic Data Interchange. And a lot of programmers and programming companies have made a lot of money in ensuring that the data from Boeing is understood by data, by general electric. Data from general electric is understood by somebody else and so on because everybody was putting data in very peculiar format, one's own format and there is no easy way of exchanging data. So files used to be transferred, then file metadata would be transferred by some other application, exactly same application as to run there. If some other application runs, it can't figure out. XML therefore makes it generically possible to convert metadata and data. And the notion of XML schema is a very powerful notion which makes it easy to exchange data. It appears simple now, but that's why I'm stressing that prior to the XML, the world was a mess in terms of exchanging information easy. Automatic exchange of information could not happen.