 Good morning everybody. I am Deepak Phatak from the Department of Computer Science, IIT, Bombay. And I am very happy to welcome you all to participate in this second subject of the ISD workshops on database management system. We have started these workshops in order to empower our technical teachers so that they can in turn empower the students whom they teach. I would like to use this opportunity to share with you some background on the conceptualization of these workshops and how we are proceeding with the execution. Long time ago we realized that the conventional ISD or QIP workshops which are conducted for 30 or 35 teachers at a time, while they serve an extremely useful purpose, they are unable to cater to the large number of engineering teachers that today we have in the country. Further, we also noticed that while these two week courses are useful, since there is no subsequent engagement envisaged as part of this workshop activity, teachers who benefit from these workshops go back and are not necessarily able to use that knowledge gained in their regular teaching. It is in this context that we conceptualized our model. The physical workshop that you are attending for two weeks is only the beginning. Our objective is to engage you for a longer term so that not only you interact with us but you interact with each other through a portal which we shall be launching within six months of completion of this workshop. And that will be an open source portal that portal will contain not only the recorded audio video lectures of this course, all slides, all the material for laboratories and assignments, but it could eventually have a whole lot more contribution from all of you and from many others in the country. Hopefully this will become the database knowledge portal for all the learners in this country in coming years. That is the long term hope. We are thankful to M.H.R.D. who have approved the inclusion of this model in the national mission on education through ICT. And I'm also glad to report that this is the first time the actual registration for the workshop has crossed thousand teachers which was our ambition right from the beginning. A few things more about the model, ordinarily an IST or QIP workshop is held on a specific subject and it typically embodies the views of the expert faculty on the subject matter. Since the teachers coming for such workshop have to teach the courses in their conventional way in their respective colleges, we decided that when we run our model we will try and use the syllabi that prevail in most universities including the examination pattern that is seen in most of the colleges. Therefore we have decided to cater to the common syllabus that is used in the Indian universities for teaching. We believe that the course material designed in this fashion will permit participating teachers to go back and use this material directly in their own teaching efforts in their respective colleges. Of course we would like to make a difference because in the IIT system as you know and in several other autonomous institutions the syllabus is not necessarily fixed. There is a written syllabus which acts as a guideline and the expert faculty member is free to further explore the new developments that are taking place in the field. More importantly the examination system that obtains in the IIT like institution does not have a conventional straight jacketed makeup. We never have questions like solving six out of ten problems. We never necessarily say that there will be at least one question from each portion and so on. Of course you would be unable to change the examination pattern in your colleges immediately after you go back, but during this workshop we would like to give you a glimpse of how the questions are set, how the difficult questions are useful in challenging the minds of the participants so that hopefully most of you will be encouraged that when you conduct your examination at least as part of the in semester evaluation you will be encouraged to use harder problems as sample problems, more difficult lab assignments and tutorial assignments which will challenge your students. In this workshop we hope to give you a glimpse of all of this coming to the subject matter of this workshop database management system. When we did the survey of the prevalent syllabus we found that in most places the book used on database management system was the one written by Kurt Selbersah and Sudarshan. We are very fortunate that one of the authors of that book himself, Professor S. Sudarshan, who is my colleague in the department of computer science for more than two decades now, well one and a half decades has kindly agreed to be the instructor in charge of this course. I am therefore very pleased to see that the known database guru in the country is personally available to deliver lectures and interact with you. I would request you to use this opportunity to the hill, interact with him, ask questions. The course coordinators at respective remote centers were assembled in IIT for a week and they have personally interacted with Professor Sudarshan imbibing the philosophy of this workshop, imbibing the lab sessions and the tutorial sessions that have been planned. They will in turn guide all of you at your respective remote centers in conducting your labs and tutorials in the afternoon. So please use this opportunity to make as much as is possible from this workshop so that the take away will permit you to be greatly empowered when you teach these courses later. In conclusion, I would like to add one more thing. Since the objective of this workshop is a long term engagement, I hope you will remember that the workshop certificates will not be distributed at the end of two weeks when the workshop closes. Instead, participants at respective remote centers will be broken into teams and each team will be required to make a specific contribution in terms of solving an assignment within two weeks of completion of this workshop. It is only on the receipt of their assignment which will be gauged by the respective coordinators at the remote centers that the certificates will be issued. The objective is not to again do anything forcefully but to request you that this kind of engagement model where you don't forget what has happened in the course but remember it extensively over the next two weeks by working on the assignments will hopefully prompt you to further remain engaged with us in our efforts to increase the quality of our teaching in database management systems. With these words, I would like to introduce my colleague, Professor Sudarshan and the database guru of the country. So, welcome Sudarshan. Thank you. Thank you so much for agreeing to engage these teachers. Let me tell you, now I speak on behalf of these thousand teachers. If I am one of them, I represent a band of about 150,000 engineering teachers in the country. We work at small colleges. We do not have the kind of opportunities that teachers at places like IIT has. We learn on our own mostly and therefore, we look forward to this opportunity to learn from you. All yours. Thank you so much. Thank you very much, Professor Fadak. It is a bit of a stretch to call me the database guru. There are many more database gurus here. I am merely one of many. So, in any case, welcome to this workshop. We have 10 days of work ahead of us and hopefully all of you will enjoy the workshop as much as I do in teaching it. So, let me put up a few slides on the goals of this workshop and on how we are going to structure the workshop before I get into actual technical details. I hope you are able to see the slides on Avue now and you should be able to see the cover slide. So, here is how we will be running the course. We have lectures in the morning for 3 hours as you have probably found out by now. Followed by lab sessions in the afternoon where all of you have to get hands on in doing stuff on computers in the lab. The lab component of this course is, if anything, even more important than the theory component, because the theory component is something which all of you can pick up by reading a textbook or by reading the slides. Though I do hope that when I cover it here, we will include little more insight than you would pick up just by reading a book or by reading slides. But in the lab component, we give you exposure to actual software which you will be in turn using to teach your students. A feedback which I have heard from several people is that the lab component of many courses is inadequate and many faculty have expressed the need for better support in terms of labs. And I hope that this course will play a significant role in that and we will have assignments every day which are either lab or tutorial. Thanks to the efforts of Professor Fartek and his team, we also have a new gadget which is the clicker which we are going to use in the course over the next 2 weeks. Now, I hope all of you have clickers. I believe due to overwhelming response to this course, some of you will be sharing clickers which is fine. Either one of you, you can take turns or whatever. So, we are not going to be using the clickers to evaluate your performance directly as a result although it will be indicative. But what we are going to use it is to get a broader feedback on how people are finding the course, how they are understanding it, are there parts which they have followed or they have not and we can tailor the course appropriately. So, what I would like you to do is at this point if everyone is ready, please test your clickers at this point. Just click on any one number. So, we will find out which is the most popular number among 1 to 4. So, please use the clickers now and pick any random number from 1 to 4 to make sure the clickers are working fine. We will have more meaningful questions coming up in a couple of minutes. So, while we wait for the clickers to come online, let me continue on to the course structure. So, the course is for 10 working days from today through the 23rd. Let me give you a quick outline of what we will be covering in this course. Basic curricula as professor Fartak said covers what most of the courses across the country cover which also happens in this case to be what we cover in the database concepts textbook. And we will essentially be going sequentially through the chapters of this book with a few minor changes in the chapters that we cover. First of all, we will be giving a lot of importance to the relational model and SQL and we are going to devote several lectures to this topic. For the simple reason that pretty much everyone who goes out of this course into the industry will be using SQL. A very large fraction will be using SQL on a day to day basis. So, that is something which is very important to understand how to write SQL queries. SQL is also a rather confusing language for someone who has only seen imperative languages such as C++ or C or Java earlier. And therefore, it is important to take time to explain the concepts and how SQL is different and give enough examples so that students are able to learn how to program effectively using SQL. So, that is going to be one of the first focuses of this. So, right after an introduction to SQL, we will move on to how to design relational database schemas. Now, certain books including ours earlier on used to cover the schema design first followed by SQL. Now, we realize while teaching our courses that this causes a problem. We are not able to conduct lab exercises in parallel with the course for a while because we have not covered material which is required for the lab. As a result, we switched to having SQL first followed by database design. It turned out that this was also a wise move for another reason which is that when students first see relations, they really want to understand what you can do with the relational model, what kind of queries you can write. Only after that do they have a level of maturity which is required to do a schema design. So, it serves a double purpose to cover SQL first followed by database design. So, as part of database design, we will be covering the ER model followed by normalization. Now, both ER model and normalization have been around for many years now. However, in the ER model, the notation has not been fully standardized. There are in fact three broad schools of notation in the ER model. One of which is the standard Chen's ER notation which came first. The other is the notation which is widely used in many industries or was widely used called IDE F1X. And then third which is more recent is to use UML for modeling rather than ER modeling. In the industry, a mixture of all of these are used, but UML has come now to dominate modeling activities. And in this edition of the book, we decided to go with the UML modeling. However, UML has a lot of other things associated with it which are not relevant to a database course. And therefore, we chose a subset of UML and then we added a few more features which are not there in UML, but have been there in traditional ER notation. And as a result, we have something which is almost completely a subset of UML with a few extra features which we will use as the basis for our ER modeling. We will also show you how to use certain tools to do this modeling as part of the lab. After that, we will have a little bit of theory and more of lab on how to build web applications using databases as the back end. Now, this has become the standard mode for deployment of most web applications. Almost all of them have a database behind because they all need to store data. And it is very important for students to learn how to build these. And as a result, we cover them in the database course although the web is not specific to database systems. And finally, we are going to cover database internals. Again, 10, 15 years ago, most courses did not cover internals and externals in one course. There were certain places which covered mostly internals and there were many more places which covered mostly database external, external meaning SQL and schema design and so on. These days, many courses across the country have combined these two aspects and they cover both the SQL and building applications that is externals as well as the internals to some extent to the extent that it is important for programmers who use SQL to understand what is going on behind the scenes in the database. And it is very important that students know what is happening to some extent at least even though to understand fully what is going on requires a lot more time and energy. So, in this course, we will be giving an overview of database internals and hopefully, we will cover all the material which a practicing programmer will find useful. In this course, we are actually covering a little bit more than most of you will probably be able to cover in a course in your college or university. But I decided to go ahead with it because I believe that this is a topic which some of the instructors may not have covered in detail earlier and you may benefit from knowing a little bit more than what you will be teaching your students. And we will cover a few advanced topics in the last day of the course. So, that is the summary of what we will cover. I will also mention what we will not be covering. These include a few topics which used to be a standard part of database course, but we believe that they are not as important as they used to be anymore for a basic course, although the topics are certainly still important. So, two of these topics are the relational calculus, the domain relational calculus and the tuple relational calculus and object based databases which include object relational and object oriented databases. These are appropriate today for advanced courses, but probably not for basic courses. However, there is a small twist in the game these days where a technology called object relational mapping which has some connections to object oriented databases and some connections to object relational databases is used in the industry. So, if time permits we will give a short introduction to that although we will not cover the other aspects. There are many more topics which would be nice to cover in a first course, which have traditionally been in a second course in most places and therefore, we will not be covering them. These include parallel and distributed databases and more detailed coverage of object oriented and object relational databases. So, maybe in some future course those can be discussed. So, that was a quick summary of what we will be doing over the next few days. So, now let us move on to the actual technical content. So, let us start the technical part with quick overview of what database systems are meant to do and why they are important. I have quite a few slides here in this chapter, but I am not going to cover all of them in detail although it is pretty useful to go and read them offline. So, all of you have been teaching databases. I would assume maybe a few of you are here to learn about databases before you start teaching them, but all of you I am sure have used database systems whether you knew it or not. Pretty much everything you do on the web today stores data somewhere and I am sure all of you have used many many applications on the web. You have also used applications in your university for various tasks in all likelihood such as registering students, admitting students first of all, collecting their fees, maybe entering their grades on a computer and so on and so forth. All of these use databases as a back end and their applications built on top of databases. These of course date back quite a long time with some of the early applications including airline reservations and in India one of the early uses of computer with applications with a database as the back end was the Indian Railway Reservation System which has been so useful to us across the country. There are many many more and I will not bore you with all of them. So in the context of a university database backed application as I said does many things starting from admitting students to running courses to saying what courses are running and which students are registered for which courses, what are the marks or the grades that they receive and so on. In fact we will be using this as a running example for our database queries and we will be introducing you to a particular schema which models a university database. This is going to model only a small part of a university because a real database can be very large. Typical large scale database today has thousands and thousands of relations and it is obviously not feasible to teach a course with a schema containing so many relations. So we will use a small subset of it to introduce SQL in relation model and how to write queries. I should also mention that in early days database applications were built directly on file systems but there were many drawbacks which people realized as time went on. In the original era of mainframe computers when they were first introduced there was no such thing as a database. Data resided in files and people had to deal with file formats and so on. They had to actually write out the files which were on tape in those days and then read data back from tapes. Now this was a very low level of abstraction meaning that programmers had to deal with every little aspect of how data was stored how it was to be retrieved and so forth. And this seemed like a good thing for programmers because it gave them job security or so it seemed but it turned out to be a bad idea for enterprises because doing everything was very expensive and certainly small enterprises could not afford any of these things. So a database system really gives a much higher level view of data and provides many features which are important both for ease of access to data and to ensure data is robust in the face of all kinds of failures. So as an example of some of the problems which database system which is properly designed can help to avoid are database redundancy and inconsistency. And we will see later what these topics mean but inconsistency should be clear to you. If there are two parts of the system which give two different addresses for you then whoever has to send you a letter is going to be confused which one to use. So that is an example of redundancy and inconsistency which can be caused by redundancy. I already mentioned the difficulty in accessing data if you have to write a complicated program to do even simple tasks. A third problem is data isolation which is really multiple different file formats which get created if you store data in files and different people will use different things and now building an application which accesses two different parts of the database is complicated by the fact that they use different formats. A single internal representation in a relational database system avoids these problems. Integrity problems there are several kinds including inconsistency between copies that is one kind. Another more common kind is what is familiar to many of you is foreign key constraints. If I store a name of a degree program that had better be a valid program. I do not want to have a student who is registered for the degree x, y, z which is not valid. They should be registered for a meaningful degree. There should be other constraints like this such as the department name should be valid and the role number should be a valid role number and so on. Now at the systems level and this aspect is what we will be covering in the second half of the course on database internals are issues such as atomicity of updates. If you have a transaction doing multiple updates what if there is a failure in the middle? Is it going to leave the database in inconsistent state? This was a question which database people answered long ago and it turned out people in other areas ignored for a long time. In particular file systems people ignored this for a while till they realized what problems there were because of failures that happened in the middle of some update. And today this idea of atomicity is used in many many things even outside of database systems. A second issue at the internal level is concurrent access by multiple users. What if two users go and update the same data at the same time and cause a mess? How to deal with it? So we will be seeing that later in the course. A third aspect is security. Who can be allowed to do what to which pieces of data? Again we will be covering that later. So database systems of course have to provide solutions to all of these. Any database system has to abstract away from the lowest level of representation which are basically bits and bytes sitting on a disk or in memory somewhere which is a level which most programmers will not be able to deal with and give a much higher level view of the data. So the highest level is the view level. In this slide they are shown upside down with the physical level being at the top but it is actually the lowest level. The logical level is what is stored in terms of the schema design and the view level is what can be made visible to specific applications or programmers as then and we will see this later in the context of SQL. So this figure shows the same things the right way up. So as you can see there has to be one physical representation, one logical representation but there can be many view representations depending on who needs the data. We have a notion of a schema which is what is the information stored not in terms of what are the actual role numbers or actual names but at a higher level which is what are the relations or the entities or relationships which we represent, what are their attributes, what are the types of their attributes and so on. So this is the schema. Then there is a notion of an instance which is the actual data sitting in there which is who are the specific instructors, who are the specific students, what are their names and so on that is the instance of the data. And there is a notion of physical data independence which says that the lowest level of the schema, the physical level should be decoupled as far as possible from the logical level so that we can make changes at the lowest level without affecting higher level. So this is called physical data independence. Why is this important? It lets us for example add indices to improve performance and to change the file representation from something to something else if it improves performance and so on. We will see a little bit of this later. There is a concept called a data model which is a high level abstraction or collection of tools for describing data including the relationship between data, the semantics, the constraints and so on. So there are multiple data models which have been proposed and for the bulk of this course we are going to focus on the relational model although we will talk a little bit about, in fact we will talk a fair deal about the entity relationship model which is widely used for modeling data initially and we will talk a little bit about a few of the other things including the object oriented or object relational data model and a little bit about semi-structured data models such as XML. There are older data models also which are used in some very old legacy applications but we would not be covering them here. The relational model which all of you can intuitively see what it is even if you have not seen it formally before stores data in the form of tables such as the instructor table shown on this slide. As you can see there is a bunch of instructors one per row and each instructor has a number of attributes. What are these attributes? There is an ID which is a unique identifier. There is a name, there is a department name and there is a salary. Now of course this is a toy example. In reality any organization would have to store a lot more information about instructors or for that matter students or anything else. But to keep our example simple we are just modeling this much. A little bit of notation if you are not familiar with it. The individual rows of this relation are referred to either as rows or as tuples. The term tuple comes from the theory folk who view this as a n tuple representing the data. I will be using these two terms interchangeably. So row and tuple mean the same thing. Similarly this table has multiple columns. Each column is referred to as an attribute of the relation. So again I will be using the term column and attribute interchangeably. And you will find in the systems world people tend to call it rows and columns. In the theory world people tend to call it attributes and tuples but they mean the same thing. Now here is a very very tiny sample database consisting of two relations. The instructor relation and the department relation. You can see intuitively that each instructor has a department name and the department relation also has a department name attribute. And it should be clear that the department name in the instructor relation refers to a particular department which ought to be present in the department relation. If it is not then there is a problem. So we will see later how to ensure that such a row will be present for every department name which is there in instructor. Now a little bit about the languages used to access databases. As I have told you already SQL is the most widely used language. Although in the early days there were many other languages which were proposed but most of them have fallen by the wayside because not enough people use them. But the concepts behind a language are common and even today new languages keep getting generated. In particular variants of SQL are many but there are languages which are SQL-ish but not quite SQL which have been proposed for XML or other semi-structured data models and so forth. And some of these concepts are common across all the languages. So there is a data manipulation language and then there is a data definition language which we will see in the next slide. So data manipulation language is a language which is used for accessing and updating data in the database. Now there are two classes of data manipulation languages. There are the procedural languages such as C, Java or C++ which could be used in theory at least to access databases. Although in practice today the most widely used one is SQL. And SQL is a more declarative language where you do not specify exactly how to carry out computation but you say what you want and it is the job of the system to figure out how to do it. It turns out this idea of declarativeness has a lot of applications across many areas of computer science. If you specify exactly what is to be done the system is limited by how cleverly or how foolishly you said how it is to be done. If on the other hand you can say what you want declaratively. If you have a clever implementation inside which can figure out the best way of doing things that can be a lot faster than what even a good programmer would have written by way of detailed instructions. So this paradigm has found enormous success in the relational database area but has begun to creep up in many other areas as well. A data definition language is the language which we use to specify the schema. Again SQL has a data definition language component which is the most widely used. So we have a small example on this slide where we have a SQL statement to create a table called instructor with the four attributes which we saw earlier ID name, department name and salary. In addition for each of these four attributes a type has been specified as you can see. The first thing is a character 5 which is a fixed length character string. The second one is a wire care 20 which is a variable length character string of up to 20 characters. The third is again a wire care 20 and the last one a salary is numeric 8, 2 which means 8 positions total of which 2 are after the decimal. That is how you specify the schema of a relation. There is also a lot more which goes into the data definition language we will see that in detail later and these are all stored in a part of the database called the data dictionary. And these include both the types you have seen as well as integrity constraints such as primary key and foreign key constraints which we will see later on as well as authorization information. Now the SQL data manipulation language is what we are going to use extensively. Here is a small sample of it which does the following. It says find the name of the instructor with a given ID 22, 22, 2. Now how do you do this? To understand how to write the query we always have to go back to the schema and see which relations contain the information that we need to access. In this case our life is simple because the instructor relation as we saw already has an ID as well as a name attribute. So all the information we need is contained in this one relation. In general as we will see later we may have to connect information from multiple relations to answer a query. In this case it is all in one relation. So the SQL query is simple. It basically looks like select name from instructor where instructor dot ID equal to 22, 22, 22. So that is going to look for rows in the instructor table, find which all rows have this particular ID attribute and print the name information for all those rows. Now if we declared ID as a primary key for this table instructor as we ought to have done and what will happen is there can be only one row with a particular ID value. So we will find the unique name of that instructor. Now there is another example right after this which connects information from two relations. So what is this doing? It is finding instructors in departments whose budget is greater than a specified amount. So first of all we have to look at a department relation to find which all departments have a budget which is larger than 95000. Then we have to connect this information up with information from the instructor relation and what we are doing here is called a join of information. We will see this in more detail and from that we extract the information which we want. So if you are not familiar with this we will cover it in detail shortly. SQL is a language used to talk to the database but application programs are typically written in some other language such as C++, Java, the .NET family of languages or PHP and Python, Perl and so on. There are many many languages used to build application programs. Now all of these need to talk to databases and the way they do it is to construct SQL queries and ship them to a database which executes the queries and sends the results back which the application program consumes. So we are going to see one of the interfaces called JDBC is an API. ODBC is an earlier one for other family of languages JDBC is for Java and there are other similar ones. But the one we will see illustrates the basic features which all of them have. The second part of the course after SQL will be on database design and again there are two levels of database design. The logical level which is going to be our focus. There is also physical level which is best done only after understanding what is going on inside a database. So we are not going to cover that initially but we will discuss it as we cover database. As a small example of database design gone wrong to motivate why we need to be careful about it. Here is a relation which has combined information from instructors and departments. Now we showed you these two separately which is what a good designer would have done. But let us say we had a designer who did not quite understand how to do design. And they said here is the instructor, here is the name, here is the salary, here is the department name. These are the same as what we had. But in addition they said well we really want to know which building their department is and what is the budget of their department. So they put all of these as attributes in one relation and here you have a relation with all of this. Now if you look at this relation carefully if you see the rows in here there is a computer science department down here. I do not know if you can see the cursor it is the fourth row and the computer science department also appears in the seventh row. If you will notice the department building and budget have to be unique in our schema. In reality some departments have multiple buildings. In our simplified world the department has one building and one budget. So if you have two instructors in that department the building name and the budget get repeated. So this is an example of redundancy which should be avoided. So it is intuitive that this is a mistake. So you can treat design as an art and let people figure out how to do it by being artistic or clever. But that does not give the best results. What you need is a theory for understanding what designs are good, what are bad and a set of practices which help you come up with a good design. That is what we are going to cover in the database design segment. Now here is a quiz question. Let me ask you to think about this question anyway. So the question says what is the problem with this particular relation? Is it missing information, repeated information or everything is fine or perhaps the instructor's salaries are too low. Which I am sure all of us would love to have higher salaries but that obviously is not the answer. It is one of the other three and if you have been awake at all you will know that the answer is P. So as I mentioned in the topic section there are two broad approaches which I use in conjunction for database design. One is normalization theory and the other is the entity relationship model. We will see this in more detail later including the diagrammatic notation I am going to skip it here. And as I mentioned earlier there are other data models for lack of time I am going to skip this and you are welcome to read these slides later. Now here is the schematic picture of the internal level of a database system and if you can read this the letters are rather small even I have difficulty reading it. But if you can read it you will see that the top part is a query processor which includes something which understands or parses the SQL data definition language. Whatever other language they may use and then figures out how to execute any particular query which comes up with a query plan and gives it to a query evaluation engine which then executes the query. What does it execute the query on? It executes a query on the actual data which is there in the database. Now the database may be stored on disk or may be these days on flash storage but to operate on data you have to bring it into memory. So there is another component called the storage manager which abstracts away the details of the underlying data storage and gives you a somewhat higher level view which the query evaluation will use. We are going to see this in detail later in the course and if you submit a query it actually goes through multiple steps including conversion to a simpler relational algebra notation or some variant thereof followed by optimizer which figures out how to run that query followed by actual execution of that query. We will again see this later. The second part of our internals is on transactions and what is a transaction? It is basically a collection of operations that form a single logical step. So if you go to a bank and deposit money or withdraw money that is a transaction as far as you are concerned. As far as the database is concerned a transaction consists of operations inside of the database. To you receiving cash is part of the transaction to the database it does not know that you are receiving cash or whatever it is just a set of updates on the database. So what if there is a failure in between? These are issues which need to be tackled. The transaction management component ensures that the database remains in a consistent state even if there are failures. There is also a concurrency control manager which controls the interaction between concurrent transactions. So let me quickly wrap up the rest of this chapter. There are several different database architectures including centralized client server, parallel and distributed things and we will see them briefly later on. And there are many different kinds of uses of database starting from those who have no idea they are using a database because they are only talking to an application program. Moving on to application developers and on to data analysts who basically look at the data and try to help the business make decisions on what to do like what products to manufacture which are selling well. Should we offer a discount? Should we run an ad campaign with ad campaign work? All these decisions can be studied based on data which has been collected. So it is very important these days for every organization to have good data analysts. And finally database administrators who deal with the database itself and make sure it is operating properly. There are a few more slides on history which I am going to skip for the moment except to note that electronic computers were born in the early 50s and very soon after they were born they were used for data management application. There is a pressing need for it. In the 50s we had magnetic tapes and subsequently a lot of database systems have been driven by the underlying technology. When people moved from tapes to hard disks that completely changed the design of database systems. They were basically redesigned and the relational model could come about to a large extent because of this new technology. And over each decade there have been significant trends. Now anything which was a trend in the 1990s when the web was born is probably something which looks very old to many of your students although to us it still seems new. For students it is something that was there when they were young kids so it is old stuff for them. So what is happening today in the context of database systems? Well in the last 10 years or so XML and other semi-structured data models have become more important. That was also work on automating database administration to reduce the dependence on DBA. But most recently in the last 5 years or so there has been an explosion of growth of really really big database system on a scale which was unimaginable sometime back. And these are the database systems that power the web applications which all of us use and take for granted. We take for granted that you can have an application with hundreds of millions of users. This is a scale that was unimaginable some years ago. How do they do it? Well unfortunately we do not really have time in this course to cover that but there is a lot of information on the web on how people are building very highly parallel database systems to handle this. It turns out that if you want to have really high levels of parallelism it is easier to do it if you sacrifice certain features which a bank would not at all be happy sacrificing. Such as certain kinds of atomicity. So those trade-offs are being studied in the research community and industry today. But again it is beyond the scope of this course. So that is the end of chapter 1.