 When we begin this course is to try and understand the motivation as to why we want to move to a new architectural style, if we are all able to buy into the fact that SOA is going to give us something new, is going to give us something that current technologies do not fully address then I think we would have made a good beginning for the next five days. Without that buy in I think it's just going to be very hard for you to sit there and swallow whatever else is going to come beyond this you know he's just talking of XML he's talking of soap he's talking of this I don't even understand why this is useful I don't want that scenario so I think it's useful to spend the first part of this course especially the morning to understand the motivation so why SOA what is it going to buy us where have we come from what is it that SOA is changing in this scenario what is going to be new about it right so that's what we're going to first talk about and then we'll introduce since the course is also specifically going to focus on web services as an implementation platform for SOA we'll talk about what exactly web services are from an introductory perspective today morning and then certain XML technologies that underlie are at the heart of all the specifications that we are going to explain over the next three days and therefore we may want to get familiar with XML schemas specifically so many of you may have programmed using DTD's for XML but you know DTD's are a thing of the past now and it's being replaced with XML schemas so we won't discuss the details schedule right now we'll discuss it a little later but we'll start off with the motivation for why we think service oriented architectures would be useful what is it that they're going to bring to the table that doesn't exist today I think in order to understand that it would be good to get a historical perspective first of all of where are we coming from right and so that's going to be this part of the talk even before that let's understand what the challenges in enterprise computing are right so these are the standard challenges in enterprise computing that we see we want systems that are scalable you know if you take really large enterprises Johnson Johnson Boeing you know GM and Ford hundreds of thousands employees spread over literally hundreds of locations in the world this is scalability this is true scalability that we're talking about right or you're talking of websites or e-commerce sites can handle millions of users with thousands of concurrent users hitting this particular site it's a scalability is a concern the second thing is that in large enterprises it's very hard to pin down a particular standard to say everybody shall use Windows something or the other and mandate it at the CIO level and expect that to flow down is not very easy to do right there will be pockets which which have different requirements and therefore organically you will expect that certain heterogeneity is going to come into the environment right so diverse environments are going to exist some of the people may even a operating system level at every level at the from the operating system from the hardware onwards all the way up the application stack right the third challenge is in today's age everything has to go faster right I mean it is not not to be able to slow it down so what used to be an 18 month release cycle for software today has shrunk significantly people are pushing for three months or less cycles even though it may not be very practical but certainly the incremental mode of development or features can be quickly released is becoming the norm right for a while I think we ended up sitting and complaining a software engineer saying you know the quality of software is going to suffer if you do this if you keep pushing for faster releases and to get more features out and shorter amounts of time then you're going to end up with poorer software but I think one of the things that we would all realize by now is there's no point complaining about this because this is the way it's going to go right this is only going to get worse if anything in the future so I think time to market is going to be a continual issue that we have to deal with certainly costs are again an issue and cost can be looked at from two perspectives the cost of development you know what does it cost to actually put the software out initially and then more importantly now it turns out that the cost of development of the initial acquisition of software is turning out to be a smaller and a smaller part of the bigger picture right the bigger picture being if I have to run the software for the next 10 years the cost of operations and maintenance are far outstripping the cost of initially deploying the software onto a particular platform right so this O A and M as they call it this issue in enterprises has kind of overtaken this notion of the cost of development so those are some of the challenges in enterprise computing let's keep that in the back of our mind right so these to the first two slides are meant to give you idea of what is it that we want to keep in our focus and then look at what the evolution of software platforms has looked like right so enterprise applications themselves are typically made up of these components that you see here right so there is three components that we commonly know about the presentation logic of the UI is the way that people interact with this then there's the business logic which actually holds the code the business rules type of component the workflow may exist in there and so on and then there is the data access logic because every enterprise system revolves around a lot of data that is stored in typically in relational databases and then a component that we don't often talk about a component that we don't often see are these system services as I call them what are these system services things like security things like concurrency management connection pooling you know all of these things that we use quite a bit today right these are system services and they do not exist in the form that they exist today explicit being part of the infrastructure it was actually very embedded in the early days of software development and that slowly started changing and so we have these four components right so the UI the business logic the data access logic the data itself of course and I'm not explicitly mentioned that and the system services so over the next half an hour or so 20 minutes to half an hour let's take a look at how have software how have the software methodologies however the software systems evolved or the last 15-20 years because I think it's useful to do this in the perspective of it indicates a certain trend and the logical next step in the trend should really be what we should be seeking right so what happens next if we take a look at this trend what would be the next step and if we can answer that with the next step needs to have these features and those features are SOA for us right that's really what we should be looking at that's why this historical perspective I believe is useful so looking at this there are actually two axis along which evolution has proceeded right within software systems the first axis is that of you know how many changes can I make and how easily can I change the software platform in other words how how flexible is my application platform that I'm deploying it on right how much does it cost for me to make a change that is one axis of evolution and the second axis of evolution is where are these system services that we talked about where are they coming from are they embedded are they explicit is their infrastructure for it for this it is not something that is standardized right so those are some of the issues there so there are these two axis of evolution and I think it will become clear as to what I'm talking about if you actually take a look at the different incarnations or avatars that we have gone through right so these are some of the standard out are so we have seen the notion of single tier two tier n tier three tier is just a specific instantiation of that and even within three tier we had remote RPC for I mean remote procedure calls first and therefore the RPC based systems and then we move to more distributed object based systems and then we move to web based systems right finally we are at this notion of what I'm calling application servers that's the technology that is used today out there in the industry to build applications right initially there were proprietary application servers examples being tuxedo decom etc and then the things got standardized somewhere along the way so there was an open standards process that came into being and so that's where we stand and so we briefly discuss each one of these right you are probably familiar with this right so we'll go pretty fast here so single tier everything is monolithic right everything sits in one place I have to point to something but so everything sits in one place these are mainframe based applications right all of these three layers of business logic of presentation logic of data access logic and the database itself sat on the mainframe right and many of you might have even written applications for the mainframe here so it's a very centralized model and it is a very monolithic model so there are two aspects of it right there's no distribution involved here right everything sits in one place and they're typically dumb terminals that were accessing whatever applications around on the mainframe and it is also very monolithic so cobalt type applications maybe using kicks types of infrastructures out there so it is the obvious disadvantages of something like this would be what I mean you've studied single tier systems before right maintenance is hard there are two aspects to maintenance I would say it's hard and it's easy why right the management aspect of it I think if you take maintenance to include the management aspect of it management is much simpler in this scenario right because everything is centralized it's in one location things are not spread all over the place so even my deployment means I just have to redeploy to that one machine right so the management aspect of it is much simpler and so there is no client side management I have not actually spread the UI on several client machines so I don't have to redeploy to that so on so forth and clearly data is all sitting in one place data is not replicated so I don't have to worry about consistency issues if I change one copy of data I don't have to worry about the fact that another copy may go out of sync with this and so on right so those are the advantages of this kind of a model and the obvious cons one so making a change is going to be much harder that's the maintenance is harder kind of an issue and it's all it's all very intertwined in this kind of a scenario therefore making a change is expensive because somebody has to know the whole code well and the learning curve for that may be pretty high so every time somebody leaves the company you're in a whole lot of trouble right so that is the disadvantage of that then we move to two tier systems where we now had a little bit of distribution going on right we point some stuff out we said let the database sit separately right and then everything else was wrapped in what we call a thick client solution it was a fat client which had the business logic with had the UI which had date data access logic which was going and talking to the database right so we were able to split this thing out and the advantage of this was that it gave us the ability to move from database vendor product to vendor product right so we could since the database was separate we could essentially separate the database although it was not as simple as today where we have a standard interface to write to a database and no matter which database you're using that interface is going to work right what do we see JDBC whatever it wasn't quite so then but certainly this gives us the independence to replace the database engine itself there's still a lot of problems with this scenario right so the UI and the business logic was still very intertwined with each other and it was difficult making changes suddenly deployment was a nightmare because there were thick clients and if you had a thousand clients and these things were deployed there every time you made a change you have to update these thousand clients it was very tightly coupled to the data model because all the logic was sitting on the client side and every time I made a change in the schema of the data model then you know I would be in a lot trouble because it would break all the clients that were out there and I had to redeploy a change to that the other thing was scalability right every one of these clients required an independent connection to the database and the database engine you know can only support so many independent connections to it right you have to vertically scale the database engine there is no horizontal scalability that was affordable at the database side you can't because today there's a notion of database clusters are now coming into play but again that's that's the latest very latest in database software right so parallel databases if you will but that didn't exist at that point in time so you had to throw more and more iron at the database engine in order to support scalability which was not a very workable proposition right and finally the the final issue was that you would it for check all applications would be chatty by nature right so it would have to go to the database come back go to the database come back and if there's a network separating the two you would be in trouble so we said okay that's not going to work out as well so let's distribute things even further let's break things apart so that we can independently replace the different piece parts of it without affecting the others so we went to three tier I mean three tier what we did was we put the business logic in a middle tier which was then talking to the database the data access in the business logic sat in one tier and the UI or the presentation logic sat somewhere else right and there was RPC from the UI to the business logic and then there was SQL which are typically the databases with relational nature being accessed through SQL right the middle tier now had to handle all these notion all these notions of system services now started coming into play right because there was a middle tier now I had to handle concurrency there were thousands of clients hitting the middle tier right the database engine which was used to handle concurrency now is now that responsibility got transferred somewhere to the middle tier the database engine still does handle concurrency but then now there's also concurrency management of the middle tier that is required right all authentication may be done by the the middle tier the security because there's a network involved encryption now come needs to come into play so on and so forth right with this monolithic mainframe solution this was not an issue there's no network really involved right so these kind of issues come into play system services now have to be coded so what are the pros and cons of this right so here the notion was that I would completely insulate the client from the data model right so the MVC paradigm if you will right so I will insulate the client from the data model so if my schema changes my client itself doesn't have to change but however the business logic may change and therefore I will keep the clients completely safe from that perspective right so business logic can also flexibly change that was another advantage of this without affecting too many of the clients themselves so there was an interface specification as long as the clients were hitting that interface specification it didn't matter the implementation of the business logic could change along the road right now it so happened that once I created this middle tier it became a beast of its own right now it became quite complex to manage the middle tier and in those days there were no such things as app servers right therefore the complexity that was being introduced in the middle tier had to be handled by the developers who wrote the applications themselves right that was the very early versions of the three tier RPC based systems right and the client and the middle tier server were a little more coupled to each other because it was an RPC or a procedural model as opposed to an object oriented model and also the notion of being able to flexibly replace components being able to have clear abstractions of functionality which could be you know taken out and replaced with some other component did not exist in this right so that was the next natural evolution we moved to a remote object or a distributed objects in a year after that right so the business logic and the data were captured in nicely encapsulated entities called objects and these objects are components as they're called could essentially be pulled out and replaced with another equivalent object that satisfied the same interface specification but could do things differently whatever so now we had the notion remember what we started out with we said that evolution happened along two axes one was the axis of how much flexibility I got out of this system right clearly objects versus procedures there's a lot more flexibility there because you have these encapsulations that you can replace at will if you were there was a promise at least right not necessarily that it was realized fully but that was the promise right so this notion of reuse also could come into play where a single abstraction that was created could be advertised and reuse by many people right so that was the object oriented paradigm distributed objects examples here are rmi and decon right I'm going to distinguish between rmi and j2e because j2e is actually the next natural evolution along the rmi paradigm rmi came earlier and rmi simply gave you a way of remotely accessing java objects over a network right so and so there was some interface language we described these objects in the case of rmi the interface language was java itself there was there was no separate interface language okay so the pros here are it was it was better in the rpc model because from the coupling perspective and there was a lot more flexibility that was afforded there were all these data abstractions that were created abstractions could be reuse therefore the reuse levels went up right and there was the situation where you could pull out certain abstractions and replace it with other equivalent abstractions right but this notion of middle tier complexity did not go away you have an rmi application you still have to manage everything yourself it's not that somebody is going to manage connection pooling for you connection pooling is not a part of rmi right it is part of something that you have written because you find that to be useful I want to multiply database connections amongst multiple people who want to use the database connection right so that is the notion of connection pooling there then actually before we address that what happened was we were still with thick clients but then the browser became very popular right so we started saying wait a minute why don't we simply instead of having thick clients from the the client side and having heavy duty boxes machine species which have to be maintained operating systems upgraded every so often virus containment all these issues let me keep a thin client just so that it can run a browser that is my minimalist thin client probably boots off a network operating system somewhere so that it is not affected by viruses right and all it runs is a browser that's the only capability that I need of a client if I have this I will be able to deliver all my standard enterprise applications out of browser so that was the three tier or n tier the three is just a metaphor for how many of our tiers you want to have right but the client was just a browser so html was the standard thing and there was the browser was talking HTTP to the server and so even the UI component of it actually sat on the server this is your standard jsp kind of a model right so let's the UI component sits in the server is served out of a server but is simply displayed by a browser right so the pros was that the browser is ubiquitous it doesn't matter what operating system you have it doesn't matter what kind of client you have whether you have pc whether you have a thin client whether whatever you have whether you have linux whether you have windows you have a browser that's all is needed and the browser talks a standard to the back end which is to HTTP and it understands how to display html which is also a standard right so that is ubiquitous client types there was no client to manage unlike a thick client I didn't have to deploy software on the client side right any change I made to the UI was also deployed once on the server and you are done the next time the client reloaded that page he got the latest changes right department and sizing also play hero yeah so that complexity started load balancing and all those issues exactly so the in fact the one of the issues is the more we started pushing towards the server in the middle tier the complexity of the middle tier infrastructure services started going higher and higher right so now we had to manage all the issues that you brought up so we had to load balance we had to deal with security and and so on so that's exactly the point that is going to make which is that is still a con of this approach is that you don't have the complexity in the middle tier has not gone away so that's an issue that we have not saw so then we said that along the evolutionary path so this is the the summary of the evolution that has taken place up till this point right so we went from single tier to multi-tier we wanted to distribute things so and we wanted to insulate one layer from another layer right it's kind of a little software but on the horizontal level right so you're creating many horizontal layers and the layer on the leftmost hand will be insulated from the layer on the rightmost hand through intervening layers that are sitting in between so that was one thing that we did we moved from what is obviously very monolithic software to clear small abstractions that are working together these are object orientation is what we call it and this gave us a lot of flexibility and helped us improve our use which was the two important things and then we started moving more from thick clients you know application clients to more html based web based browser based front tanks so so far this has happened and this is again just a summary of going from single to multi-tier what was the difference right this is the two ends of the spectrum single to multi-tier and we've seen this already going from monolithic to going to be more object based what is the difference so monolithic there was a single file every time you made the smallest change to the application you have to recompile the entire application and put it out there that was a notion of monolithic software and in the case of object based or component based you have all these small small parts and you can simply recompile each part and dump it out there right so in fact in the case of Java now you can even recompile a class in the Java application is running and you can just deploy a class onto a running application so it will just reload the class and run with it right so you never have to even bring down the existing system okay so the outstanding issues however that we have not solved so far which is what our interest is in is that the middle tier complexity still was very high in fact it got worse and worse because middle tier was a beast that we created along the way it didn't exist earlier right so this started actually getting uncontrollable and a lot of the system services that were necessary people were inventing on an ad hoc basis how do I solve the concurrency control problem everybody was coming up with their own solution every application developer came up with his own solution how do I do connection pooling came up with his own solution etc and so this is obviously something that we didn't want so we said let's create a common server for this right these are problems that are commonly faced by every enterprise application so let's dump all this into some kind of infrastructure software that's what we're going to do right and we call this the app server the application server right on this was actually a the notion of a container right the container notion came along or it was called a container a little later but that's essentially what it was so it was that you know I had a container let me just solve so this actually looked like a bucket of a container and the container actually gave you a certain system services so these are the API that the the container afforded for the application programs to call so these are container API and into the container you drop application components or objects so this is what it looked like and this container was simply an abstraction for the different system services that were previously being duplicated by various people right so that was the notion and we had several containers then come up so there was the notion of corba which was one of the earliest forms of the container logic so we moved from the d-com rmi to j2e slash corba slash dot net kind of a notion so this is the container the container could obviously be either be proprietary or it could be it could be open standards based right so anything can be either proprietary or open standards same thing with the container so the the contract between the the container itself and the application components that resided within the container was well defined and that was defined either by the vendor of the software so Microsoft gave the software and it was dot net that you are buying then it you had a particular way of interacting with the dot net API or if it was j2e there was some way of interacting with it the API should quite well specify so some of the earliest ones were tuxedo also as a container so it was basically a transaction manager right there was kicks which was another transaction manager that was used here and so on the obvious problem of the solution is that is vendor locking right so suppose I go to Microsoft and I'm stuck with Microsoft once I start building my applications it doesn't interact very well with others I can build bridges from Microsoft solutions to other solutions but these bridges would be custom bridges that I would have to code in it doesn't come with a standard way of interacting with other applications so I could not have you know solutions that had tipco and tuxedo talking with the Microsoft dot net very easily it was not something that I could easily buy right so then we said we want an open standards based solution in fact the open standards based solution was one of the first things to come up right so if you how many of you remember corba in this space and even if you have worked with corba before this is one person so corba was this standard called the common object request broker architecture and this came out of a standard body called omg the same people who put uml out as a standard first create they weren't in charge of uml earlier so they started with this distributed object standard which is called corba which was nothing but an application server for distributed objects but their focus was a little different right their focus was actually inter language interoperability so that you could have clients written in say C++ talking to servers written in java or whatever other language and the common meeting point of this was some kind of an interface definition language a canonical representation of the interfaces that both the client and the server cycle don't understand right so that was the notion so it's the same the container and the component model that we discussed with the proprietary solution except that here that contract between the component in the container is an open standard right so anybody can implement such a container and if there was a standards compliant container obviously I can drop my objects or components into such a thing without having to worry about whether it's going to work correctly or not right it will is the expectation things right that's the notion of a standard and the other advantage of open standards based solutions obviously are that I can influence this particular standard right so with j2e or with web services today anybody can essentially go to the to the standardization committee and say I want a standard for this right so that is such a requirement and if everybody sees this requirement as being common one then the standardization process kicks off for that particular requirement right so here are some of the standards some people claim that j2e is a some proprietary solution it's not there is something called a java community process or jcp which allows anybody to float something called a jsr or a service request that will handle standardization of some particular java feature right anybody can become a member of the jcp it's a free membership from what I know maybe industries have to pay a little bit but the point is that it is open right so anybody else will be able to influence the standard as well right in advantage of course with java side is that there is portability of core because everything is java based and and so on but you don't have to stick to that that's just a happenstance so far this is the this kind of completes a historical look at where we stand today right this is roughly the scenario where we have this notion of application servers right there are open standards based container component contracts that are defined right and you can write enterprise applications in fact most enterprise applications written or written based on these kinds of standards and deploy and we have the system services have been well enough studied and we have solutions that are out there that address these problems of scalability address these problems of heterogeneity and to some extent if you use java the issue of portability is also solved for you right in fact one of the advantages of an open container is that it's things are portable right so if you have for example a cobra based application and today my cobra vendor is going to be somebody or even j2e for that for that matter suppose IBM is my j2e vendor right with web sphere then I can have a web sphere based solution IBM makes me mad for whatever reason nobody from IBM here right and so I can tell IBM you know get lost I don't want to deal with you I am in fact going to switch to web logic and they are also a j2e compliant container as a result of which I can take my application which was running on web sphere drop it into the j2e container and everything will work when I go from vendor to vendor that's one of the advantages right that's that's the aspect of portability that will exist for you all right so that's what have we not solved is then the question that comes up remember what we started out with what are the challenges in enterprise computing that's what we started out with these were three of the challenges but there were some more going forward right so what did we not solve you remember what the others were forgotten cost of maintenance yeah cost of deployment and ownership and maintenance was a major issue the middle tier logic is expensive but it has kind of solved some of the problem so we we have made it clear logic which is the application server logic that we have it has solved some problems to solve the scalability problem so you have enterprise applications deployed that have hundreds of thousands of users accessing it today so ebay for example runs on j2e right ebay is a huge company in terms of the number of people hitting it so all their back end enterprise applications are written using j2e so that issue of scalability heterogeneity etc has been sought right with the middle tier logic so one thing that we did not solve was the time to market issue we we kind of made things much more flexible than they were before but today what happens when you develop an application is that you have a component let's say that you're going to reuse a lot of stuff which is great right this is the best possible scenario is that you're reusing a lot of stuff so you bring the component in house right and now you have to write blue code or integration code for these components to work together right so for example let's say you have a standard enterprise solution it has billing it has auto management it has trouble ticketing this is stuff that you should be seeing in every enterprise right so you buy a billing solution of the shelf you buy a trouble ticketing solution of the shelf you buy some order management solution of the shelf inventory solution of the shelf but how are these things going to be made to work together is still an open issue right it's not true that everybody is going to be writing j2e application therefore you can get a billing j2e application of the inventory j2e application and so on right so you may have SAP you may have some the portal for billing etc etc you bring all these in house you have to still write code right which i call glue code or integration code you have to integrate these applications that involves so that's why if the whole brand of people who are called systems integrators right these are the people who sit and write integration code they design integration strategies for these things and it's all point to point integration right so every time i build a by a solution i will have to build bridges for this solution with whatever else i have in house already it's point to point integration right so it quickly in a large enterprise which may have hundreds of applications in fact there's a survey done at Boeing a couple of years ago where they found that they had something like 1700 enterprise applications deployed at various places within Boeing various sectors 1700 applications a lot of these were talking to each other through point to point interconnections that had been made just imagine what the picture is going to look like it's a nightmare to manage right so there are lines going from it's like a fully connected graph hardly something that we want right so we want a bus instead of a graph right where everybody can talk to a bus so the the notion that the point i'm trying to make is this time to market issue has still not been resolved because we are sitting and integrating stuff every time even though we have these nicely encapsulated components that we can reuse we still have to integrate right that's not going to come for free and that will affect time to market so that's i believe we have not solved the issue because of that right so i know you notice that i've kind of removed the cost of development aspect of it here because we have we've come a long way in server side component models today right we don't have to code a lot of the system services they come coded for us it's it will in fact it's pretty easy to write a j2e application right for purely from a functionality perspective it's not that hard to write one of these applications so the how are the cost of operation and maintenance still remains i have to deploy all these applications myself right which means i need to figure out all the hardware sizing requirements blah blah blah myself capacity management which is a huge issue if i if i'm going to scale um and i have to host all these things every time an upgrade of a database takes place i have to get the upgrades i have to install it i have to shut the whole thing down it's kind of a mess it's not that easy to deal with in fact most enterprises that you will talk to today should be reflecting this picture especially people who are out there and i see for example in vipro a couple people here this picture should be what you're seeing and if it's not then you should let's talk about it i mean because that's it's important for me to understand this as well right so this is an issue that's not solved above all there is something else that is going on that has changed the way we do business that current application architectures have not taken into account right remember we've talked about the notion of b2b earlier right so b2c and b2b you all heard of these terms business to business and business to consumer now business to consumer we've kind of solved this web based application we have it out there we have a web based content any consumer connected for more the internet great fantastic but business to business is not as simple to solve it's a it's a different proposition now you need to have machine to machine interaction which can understand what is going on i need to be able to call on some functionality that is sitting beyond my enterprise boundary there are different enterprise boundaries and now my business process is actually going to cut across these enterprise boundaries it no longer is my business process confined to my enterprise today right so the question is and if we are going to collaborate why are the internet has the have the solutions that we have today are they able to take advantage of the internet are they even able to work over the internet would be the next question that we had asked ourselves so i don't believe so and we will discuss why but this is these are some of the issues that are not yet been done right so it leads us to the next natural evolution of this notion of distributed object based app servers or component based app servers what is it going to look like it has to solve these problems right in a nutshell that is what so is right for for me that's what so is in a nutshell at least in the industry you may hear various terms about oh so is not a technology it is not a methodology it's some way cloud you know that's out there and you've got to be doing it how do i do it you know hire us and we'll help you solve the problem is the typical response that you get what we we're going to try to do here over the next five days is to kind of debunk that myth so to speak right it is something specific otherwise there's no point in talking about it except to make money which is okay nothing wrong with doing that but you shouldn't be making money and trying to scare people