 Throughout the history of the web, the two most popular web servers have always been first, the Apache web server called formerly Apache HTTP server, and Microsoft's server called IIS, which stands for Internet Information Services. Apache has about a 60% market share these days, actually a bit higher than that, and Microsoft is somewhere around 15%. The success of the Apache web server, aside from its quality, is undoubtedly due to it being open source. And whereas Apache runs on basically every platform, from Linux to Windows to Mac and a bunch of others, IIS, on the other hand, only runs in Windows. In fact, it's actually not really sort of marketed as a separate program, it's built into Windows itself, it's a Windows feature, at least as Microsoft presents it. And in fact, in most versions of Windows, you'll have IIS installed automatically, though in some of the cheaper versions like the home basic versions, you'll have to get it as a separate download. But it's free if you already have Windows at least. Generally though, if you're running a real website with some serious amount of traffic, and you choose IIS, you'd likely want to run that on the version of Windows called Windows Server. When it comes down to a choice between the two, however Apache and IIS, I would generally always pick Apache. If for whatever reason you use a bunch of other proprietary Microsoft stuff like Active Directory and other sorts of Windows Server things they have, then it makes sense to use IIS if you've already committed to the whole Microsoft path. Otherwise, I find it hard to justify using IIS instead of Apache. Be clear about the Apache web server, while commonly we often just say Apache to mean the Apache web server. Apache is actually this whole organization with numerous open source software projects. The web server is just one of those, it was the first one and it's still the most popular, but it's one among many. So Apache, strictly speaking, refers to an organization and the particular open source license which they use for their software. As for some other open source web servers, there's what's called ngix and another called liteTPD. I don't know how to pronounce that really, but it's litehttpd or what? I just say liteTPD. Anyway, both of these have become pretty popular in recent years. ngix is something like 8%, 10% of the market now, and liteTPD maybe has 5%. Both of these have become popular mainly because they're simple stripped down web servers that mainly focus on serving many, many concurrent requests quickly. So they're very high-performance for those kinds of use cases. With the trade-off though that they lack some of the features of Apache, despite liking these features or arguably because of it, both of these web servers are popularly used as what's called a proxy, either a forward proxy or more often a reverse proxy. So what are forward and reverse proxies? Well, in what we might think of as the normal case, a web server receives a request from some user agent, some web browser, and it returns an HTTP response to that user. When acting as a proxy, however, the web server will take the request and actually pass it on to something else. So it doesn't really process the request itself and return a response, it passes it on, then the proxy, once it gets back the response it sent along, it then returns the response to the original requester. The difference between a forward and a reverse proxy then is where the proxy is located, in proximity to the requesting browser and the responding web server. The distinction is not always hard and fast, but a forward proxy we usually think of as positioned in the immediate network of the requesting user agent itself. So say you work at a company and that company has its own local network that's connected to the internet, but they configure their network such that all of the computers on it have to use the forward proxy in the network to go out to the web. If you look in your operating systems network configuration, you'll see there are options for specifying a proxy, and that's where you configure the system to use a forward proxy. Now the question is, well, what's the point of using a forward proxy? Well, there are many different reasons. In an organization like a business, you don't want your users to go to certain websites, so use the proxy effectively as a filter on what sites they can visit. That's one reason it can also provide some extra security in certain circumstances. Maybe you just want to keep a record, a log of all the activity of your workers, what they do on the web. That's something a forward proxy can do. So that's the gist of why we might use a forward proxy. It's mainly about security and controlling access. With what's called a reverse proxy, the proxy is placed next to a web server and sits between the web server and the requests coming in from the internet. This also can be useful for various different purposes, but those purposes mainly are about, again, security, but more commonly just about performance. For example, very often a proxy is used as a load balancer. That is, it's a single machine that takes in all the initial requests, but then it farms those requests out to multiple other web servers. It's balancing the load between those other web servers, and those web servers are the ones that actually take the request and generate the real response. Another way in which a reverse proxy might improve performance is if it acts as a cache. So you get certain frequent repetitive requests coming into a web server rather than have the proper web server, the one that usually generates most of the requests, constantly regenerate those same requests. You have the proxy intelligently cache those responses, such that for many future requests, the proper web server doesn't have to touch those requests at all. It doesn't have to see them at all. The reverse proxy itself can simply send back a cached response. Now, in the simplest possible setup with a web server, the web server is simply configured to read static files from the file system and then serve those files back as responses for Betum. Usually the way this works is that the web server is pointed to some particular directory, and then for each request the web server interprets the path of the URL as a relative file path from that directory. So, for example, if the path simply reads index.html, the web server will serve back the file called index.html in the directory to which our web server is configured to use. This assumes, of course, that the file exists. If it doesn't, then this is a bad request. Now, while the sort of configuration was fairly common in the early days of the web, today it's not really common at all. Almost all web servers these days are configured to take the incoming request, pass it along to another script, a separate program that is, and then that script, that program, that script then generates a response which it hands back to the web server, and the web server then returns the actual HTTP response. To facilitate this pattern, a standard was developed specifying how exactly the web server and the script should communicate. And this standard is called CGI, which has nothing to do with computer-generated images. This stands for common gateway interface. And how CGI works is really very simple. The web server, for every single request, it runs the script as a new separate process. So every request is a separate script process. When the web server runs a script, it passes to it the headers and get variables of the request in the environment, which recalls a feature of UNIX processes. It's this data area in processes which is handed down from one parent process to its child processes. And while this concept originates with UNIX, it's also been imitated, say, on Windows and other operating systems. As for post requests, they include more than just a bunch of headers, there's also a body for the post, and that gets passed to the script via standard input. So the script reads the post body by reading its standard input, and then when the script generates a response, it writes it out to standard output, which the web server has configured to be a file which the web server itself can read. Usually this file is just a pipe, so the script writes to the pipe, and then the web server reads that same pipe to get the data. So that's all there is really to CGI, and it was the dominant mechanism for a number of years, but the problem is that it's inherently inefficient, mainly because of the overhead in launching a whole new process for every single request. To fix this problem, a couple alternatives to CGI were introduced, including one called Fast CGI. In Fast CGI, the server will keep a pool of script processes, which it will keep reusing over and over for the requests that come in. So it doesn't have to constantly spawn new processes, it just keeps this pool of existing processes, like 10 or so, or whatever number is appropriate for your lovable of traffic. And because Fast CGI maintains the same processes and constantly recycles them, it can't use the environment as a mechanism of the past data, because that only works at the start of the process. The parent process, its environment gets inherited by the child, but once the child exists, the parent can't affect the environment anymore. So instead, what Fast CGI is that the web server simply sends the request data to the script process via a socket, just an ordinary networking socket, and then the script will return the response over that same socket, which frankly is the more obvious thing to do is kind of strange that CGI didn't do that to begin with. Fast CGI has been one fairly popular solution to the problem of CGI's inefficiencies, but a more commonly used solution has been to simply take the interpreter for some commonly used scripting language, like Python or Perl, and simply embed it in the web server itself. In the case of Apache, support for CGI, Fast CGI, and these embedded Perl and Python interpreters, they come in the form of Apache modules, because the Apache web server is actually written in a modular style, where the core of the web server, the base of it, is very, very simple with very few features, and then most of the features you add in through optional use of modules. A number of these modules, like ModCGI, are official parts of the Apache web server project itself. So they come with separate modules, optional modules, but they're still part of the same software project, whereas some others, like say ModPython, those were originally created outside of the Apache project, though in some cases these third-party modules become official modules, they get integrated into the web server project. Now in the case of using Python with Apache, ModPython was the preferred means for a number of years, though in more recent years Python has developed a standard, which it calls WSGI, which stands for Web Standard Gateway Interface, which is not necessarily Python-specific, though in practice it's mainly used in Python. Basically, like CGI, it's another standard protocol for communication between the web server and the scripts that it invokes. I won't describe its details, but understand that now it's generally the preferred means for server-side programming with Python. And to use WSGI with Apache, we would use the module ModWSGI rather than ModPython. The term framework in programming essentially refers to a kind of library, but it's a library in which rather than you invoking a bunch of functions and instantiating classes provided by the library, instead there's a bit of an inversion of that. You write code, which is then itself invoked by the framework. So normally with a library we invoke the library code, but in a framework the framework invokes our code that we write. Or more accurately does a mix of the two. In a framework there's also stuff which we invoke in our code that we write. So what's called a web application framework is a framework for writing web apps for writing the scripts that are the server-side code run by a web server. When it comes to writing server-side scripts for a web server, there are a lot of common tasks, whatever our particular application is supposed to do. So it makes sense to use a web application framework to handle all that common work, that busy work for us. So we don't have to do it over and over again at each time. So what is the functionality provided by a web framework? Well first off there's abstraction over the request and the response. So say I'm in Python and I want to get the information about the current request. It's easiest and most convenient if I get that in the form of a Python object, rather than just a bunch of headers which I have to parse myself. It makes sense for the web framework to do that for me. Similarly, when I construct my response, it makes sense for the web framework to create the headers for me because 99 times out of 100, I just want some usual defaults. Another common feature for web frameworks is a feature for the preservation and tracking of session state. What is a session? Well, recall that HTTP itself is stateless. That is, each request is discrete from any other request. There's no inherent mechanism in HTTP that knows that this request is coming from the same person who made that request. They're all independent. The idea of a session is that it's effectively the sets of requests from a single user agent, a single browser. While HTTP itself has no notion of sessions, there are tricks we can use such that the web server can track which requests are coming from the same user agent. The most common technique is to use cookies, but some user agents may have cookies disabled because some users are paranoid about their privacy. Failing that, there are some other techniques that are clumsy, but they still work. That or just using cookies is something the web framework can do for us. In our code that we write, we just get back this session state all conveniently bundled for us just like a Python object, assuming we're using Python. If the web framework is tracking sessions, if it's keeping track of which requests are coming from which users, it then makes sense for the web framework to implement user authentication and authorization. Authentication being the process of determining that the user is someone who they claim they are and authorization being the process of granting access to some kind of resource to authenticated users. Authentication is about identifying users, authorization is about granting privileges to identified users. Virtually all web frameworks also have a mechanism for templating, which refers to a mechanism by which we stub out the outline of a page and then programmatically insert values and things into certain places in the page. If you go to most any website, you'll notice that every page has basically the same look and a lot of common elements like say a navigation bar at the top and a logo at the top and a footer at the bottom. Well, those things are part of the template and then specific pages are generated by filling in a common template. URL mapping refers to the process by which the URLs of the incoming requests get mapped into some particular response. It's basically the scheme by which we interpret the URLs. Ultimately, what a particular path means for your site is totally under your control. It's up to the logic of your code. Frameworks typically include some conveniences that make implementing that logic easier. Frameworks also typically include mechanisms for accessing a database because in most websites, there's some kind of user generated information, whether that's users placing orders or users leaving comments. That kind of data is stuff that we store in a database and so our server side code needs to access a database. Many web frameworks also include some mechanisms that make it easier for you to keep your site secure and most frameworks provide mechanisms for caching, which in some circumstances can significantly improve performance. Now, there are dozens if not hundreds of different web frameworks out there, but these five here represent what I would say are probably the five most popular for their respective languages. Zen, for example, is the most popular framework on PHP. Ruby on Rails is particularly notable because first off it's extremely dominant within Ruby. It's virtually the framework used in Ruby and also it's highly influential. So say Django here for Python is actually heavily influenced by a lot of ideas from Ruby on Rails. You also see the influence of Rails in ASP where there's almost now two different frameworks or rather there's a framework within a framework. There's the normal, so to speak, ASP and then more recently there's what's called ASP MVC. MVC here standing for Model View Controller, which is the name of a particular architectural pattern which Ruby on Rails was one of the first to employ. And so ASP MVC is still under the hood at least regular ASP, but it abstracts over that with a more modern I would sort of say web framework. ASP is actually one of the older frameworks still around and I've had some experience using regular ASP and I did not care for it because it's quite heavy handed and all unnecessarily complicated I would say. I much prefer the more modern, more Rails-like frameworks like say Django, though I've heard good things about ASP MVC so if you're programming in C sharp or visualbasic.net I would strongly suggest looking to using ASP MVC over plain regular ASP. Now what exactly is this architecture pattern called MVC? Well MVC stands for Model View and Controller and the idea simply is that we separate our application into three separate components. First you have the controller which in a sense orchestrates all things and you have the view which represents the presentational side of your application like the generation of HTML and then you have the model component, the model layer which represents the access to the database. You put all the logic for data access and data storage in the model. You put the HTML generation in the view layer and then the controller in a sense orchestrates that all and ties them together because in a typical page in a web application we take data from the database and plug it into some view, some page template and then the controller takes that generative response and actually returns it to the client. Now when it comes to the details of how exactly then to structure a code there are some competing interpretations of what this architecture means precisely but in broad outline that's very simple. You have the view representing the user interface, the model representing the data and the controller which bridges the gap which orchestrates the use of the view in the model and combines them together into one end product. So MVC is the architectural pattern which most frameworks him to especially those which take after Ruby on Rails.