 SQL as a programming language. This is more like the discussion on Fortran or C programming. So, you will find similar structures and so on. What we have seen earlier were simple examples just to illustrate the power of SQL, but this is formal SQL. First of all just like any other programming language you can define attributes in terms of the type and range of values which it will hold. The range of values that a attribute can hold is actually the domain. SQL provides for intrinsic domains needed for information systems and additional domains can be defined. Here are the SQL domains or SQL data types. Numeric. Numeric can be integer or it can be written as even int. It can be also written as small int. Integer and int as a small int are the two categories which are exactly same as the integer of your conventional programming language. So, when you say integer it is either a 32-bit number or a 64-bit number or a 16-bit number which is machine specific implementation etc. But if you define the domain as numeric then it permits decimal arithmetic properly and exactly and numeric is defined as precision comma scale. So, numeric 4 comma 2 would mean a four digit number with an implied decimal point before the two digits on the right. So, consequently you can represent numbers like 38.52 minus 69.13. A sign comes by default with any numeric. But you cannot store 176.2. Exactly. You cannot store 9.254. It would be stored as 9.254 will be locked. This is all common sense but numeric is a decimal data type everywhere. There is an approximate numeric domain. The approximate numeric domain consists of real numbers as we call them in conventional programming floating point. So, these can be defined using the term real where the precision is implementation defined. You can say double precision typically for example in a 32-bit machine the double precision floating point representation will be 64 bits and that would be a floating point double precision. Again it is implementation defined. But if you say float followed by precision it allows you to define precision. Please also note let's go back to the previous slide. Suppose I say numeric 25 comma 2. So, I am talking about 25 digit numbers or 112 comma 2. This implementation is done by software. So, this can actually be much larger than the limited capability of the machine. Similarly, when you define float and you define a much larger precision than what is ordinarily available in the machine is the job of SQL engine to convert. Depending upon the particular SQL product it might say that the maximum limit for this machine is this etc. etc. But these are the features which are available. Character strings we talked about them. You can say either character or care which means X is the maximum length of the string and if there are smaller strings as values going into one thing they will be paired with blanks. Character varying or where care or smaller strings these are stored as is meaning three character string will be stored as three character. In actual practice just as in C a string is stored which is a null terminated string. So, if there are five characters actually require six bytes for store something similar is done by the database but that is internal we need not worry about it. Okay. There is a thing called national character. National character in brackets something something does not generally translate two bytes. Ordinary byte is eight bits is a representation for a character but a national character could be 16 bits. For example Japanese character said you would require 16 bit representation. Okay. You can also define bit strings. Bit strings means 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, large amount. What is this bit string? Bit X or bit varying X where X is the maximum length of the bit string. Where do we use bit strings? Typically we use them to store very special data as a column in the table. Digitized fingerprint of an employee. Digitized voice of a person. Digitized maps. A digitized voice or a video film or digitized video film. You will notice that each one of these actually are a sequence of bits. Okay. For example, for a digital image, okay, there will be pixels or picture points and each pixel will be represented by grayscales or colors or whatever and each pixel will have either 8 bits or 16 bits or 24 bits 32 bits associated with it and there will be as many pixels in a row as there are place and then so many colors but ultimately if you go row wise you will have a huge sequence of bits. You take a video film depending upon the video encoding that you use, different bits will mean something different. Some bits will mean pixel values, some bit will mean indexing thing. Essentially the bit stream. There is no programming language which can interpret these bit streams meaningfully. So there has to be some other program which interpret this but if you want to store them, ordinary ways to store them is a fine. So for example you will say digitized voice of Fatak, digitized voice of Prasada, digitized voice of Prasad Humej Belore, digitized voice of Zubin. If there are 5000 people you will have 5000 files but look at the way you will work with them. Suppose you have an application in which you want to the moment that fellow's name is flashed or you type in something the voice is heard. Now what you have to do when the name is flashed, if you store all 5000 files separately at most you will store the file names inside the columns. So from SQL you will get only a file name then you will have to go somewhere else, start another program, open the file name, open the file, do something. Instead you can integrate all of this. You can integrate that special interpreting program as a part of your application and from the column itself you can extract the bit stream and directly give it to that special application. So these are some of the specialty things which are available. Date time is a very special SQL domain. It does not exist in any programming language. It permits you to store date and time values and more importantly it permits you to do date and time arithmetic which is not possible without writing special programs anywhere else. For example you can have an attribute of the type date. It will store the year, month and day. So date is 2006-03-27. This is one format, 27 March 2006. Internally the date is not stored like that. DD, YY, MM, OK, DD, MM, YY, YY, YY, YY, YY, YY, YY, YY, MM, DD, et cetera, external format. Internally the dates are stored in a very special form. This is a sequence of bits. The SQL engine has the ability to interpret what that bit stream means. The SQL engine has the ability to add, subtract, et cetera, et cetera. As far as you are concerned you will get the correct date arithmetic. Time is another thing. Time can be written also as time p. If you just write time it will have a time in hour, minutes and seconds but if you write a p it defines the number of fractured digits for seconds. For example if you say time is 14, 35, 05, .385. This is an external representation in SQL of the time. This is the string you write. It means 14 hours, 35 minutes, 0.385 seconds. You can prescribe p to be any number but ultimately the maximum that you will get will be the ability of the computer's clock to give so many fractional digits after that second, for that second. This is extremely important in most information system applications as you can guess easily. The ability to store date and time as special types and the ability to do arithmetic. There is another domain called time stamp. Time stamp is a combination of date and time. So time stamp externally denoted like this. So 16, 1, 2006, 14, 35, 05, .385 is a value. This is date and this is time. Where would a time stamp type attribute will be useful? Alarms, logs. When did this transaction happen? Why would this be very valuable? Particularly up to milliseconds and all. Are you familiar with the notion of trading on the stock exchange? Day traders, buy sale, buy sale, buy sale. Thousands of people are doing it. The back end system in the stock exchange is required to process these orders exactly in the sequence in which they come. Every time a transaction data comes, the first thing that the server does is associates a time stamp. Even if a few milliseconds earlier something has come that is first, this is second. Railway reservation system, time stamp would be important and as someone said auditing and so on. You can add interval to date type and complete calendar logic is implemented by SQF. This is the characteristic of SQF. Null values. You are familiar with null values. We briefly discussed them. There are two flavors. Value is not known or value is not applicable. We discussed this also. Some of you may not remember because you did not attend the first session when you see the audio visual for that session. You will find that out. For the benefit of those, I will mention this again. Any attribute is normally expected to have value. For example, a name will always have some string. Roll number will always have some string. But CPI will have a value only when you obtain a CPI after your first exam. During the first semester, most students don't have any CPI. Would it be alright to insert a value zero there? No, because zero is not the CPI. So, at that time, the value is said to be null. That means it is not known. It could also be not applicable. The example I had given is that suppose you are putting an employee database and in that employee database, you say one attribute which says married or unmarried. So, M or U, one character. And then there is another attribute which says number of children. Now, you can't say unknown really. Mostly in India, at least you will say that is not applicable. Because if somebody is unmarried, not supposed to have children. The solution to represent null values is to use special values which are normally not used. So, you use zero, use minus one, you use star, star, star, etcetera, etcetera. So, in a name string, for example, in a file, if you store names of people, somebody's name is not known at this time. You don't have that information. So, you put stars. But now, any program which looks at that data will have to remember that if I see stars, that means the fellow's name is not stars. Fellow's name is not known. This is silly because different programmers may choose different things to represent null value. Consequently, instead of making different choices and complicate the life and again requiring some special programs to be to interpret this, a scale said I shall have a special way of handling. The solution is to use a flag with every data item. So, if you have 5000 students, if you have 10 attributes for each student and if out of those 10 say 7 are nullable, that means they could have null values, then the space required to store each of those 7 attributes will be the normal space required by the specification in teacher, numeric character and an additional flag. And depending on whether the flag is 0 or 1, it will mean that the value is null or non-null. So, if the flag is 0, say it's null, no matter what that storage space in the Desk or memory contains, it is to be interpreted as null. Is Ravi's CPI 9.3? If it is null, what is the result? This is a problem with null values. Any comparison that you make ordinarily is CPI greater than something. Is his marks equal to those marks? All these if statements result in true or false. All of you are familiar with that logic? Any programming language will say that. You know a decision box, condition. True, you do this. False, you do this. What do you do? You don't know whether it's true or false. If you are making a comparison with a value and another value and if one of the value is null, the comparison does not result in true or false. The comparison is unknown. This is a third logical value that is being now generated because this peculiar thing. You are familiar with the truth tables and or not truth tables. So, true and true is true. True and false is false. True and unknown is what? False or unknown is what? Because in a secure statement you will write conditions and if these conditions involve one fellow who is a null, then you have a square has to know what is to be done. This is the basic query structure. Every SQL query, query as such will always start with the word select. It will list columns, then it will have word from, then it will list tables and then it will have the word where and it will have a predicate or a condition or a complex condition and it will end with a semicolon. As I said the other day, every SQL statement is a complete program in itself does certain functionality. Of course you can have multiple SQL statements combined to form a transaction or whatever. But usually a SQL statement and as we shall see later when we discuss subqueries could have variety of things inside. Select a1, a2, something a, from r1, r2, something, where p is a typical query. And this is equivalent to saying form a cross product of r1, r2, r3, rm. Do a selection sigma over the predicate p and finally project a1, a2, a3, a n, whatever I mentioned. This is understood, this is very clear. The cross product is implied here. There is a sigma p. You will now appreciate by the previous lecture that this p must include some joint conditions. Otherwise it will not make sense. Otherwise you are talking about so many cross product. I mean whatever number of tables that you have mentioned, you will have a problem. But this is the general formal format of an SQL statement. The predicate p is basically of the form some value expression and operator and a value expression. The operator is all the comparison operators, greater than, less than, greater than equal to, less than equal to, equal to, not equal to standard comparison operator. And value on this side, value on this side, value expression on either. Value expression means a multiplication b divided by c or 3.8 into something, whatever. It is a expression. You are all familiar with normal expressions and comparisons. So, SCPI greater than 9.3 is a valid expression. Rage dot s roll equal to student dot s roll is a valid expression. You are testing equality of these two. And all combinations of and or not, etc. are available. Consequently, the predicate can be as complex as you want. The only requirement is that whatever are the attributes that appear inside the predicate, they must be meaningful in the context of the relations from which you are doing any selection. There could be complex predicates which are written using and all or not. So, CPI greater than 7.5 and s hostile equal to 8 is a valid predicate. So, you can build complex predicates. We have seen this example last time. There is a word distinct which eliminates duplicates. You will recall our discussion on projection where we said that projection in SQL, if there are 5000 rows in a table and you project hostels, you will actually get 5000 values when there are only 12, 13 distinct hostels. But SQL provides for the word distinct. So, it's a select distinct SH from student will give you only distinct value of it. In short, this is exact equivalent of a projection in terms of mathematical relations, etc. This is something which is special. Most of the time when you select something from the table, you don't want to display that result in an arbitrary form. You would like to order that result in some form so that it is readable and more meaningful for people who are reading that table. Consequently, there is a statement, there is a clause which says order by. Order by effectively shorts the resultant row. For example, select hostel name and scpi from student will give you 5000 rows containing name of the student, his or her hostel and his or her CPI. However, if arbitrarily that data is presented, you can't really search easy. Our mind is tuned to binary search. Any list should be sorted list. So, if you say order by SH ascending scpi descent, what it would mean is that all the roles will be first ordered on hostel. So, all people from one hostel will appear first since we are saying ascending, then hostel 2, then hostel 3, then hostel 4. Within one hostel, this is called the secondary key of sorting, is the first key of sorting or main key of sorting. So, first all hostels within one hostel, hostel 1, all students within descending CPI. So, the topper of hostel 1, then next, next, next, next, then hostel 2, topper, etc. Of course, this is an example. Wherever it is meaning, you'll all agree that sorting is absolutely required. Any idea how do you do that in C or Fortran? You'll have to either call a package which does that sorting. Cobol is the only programming language in which short statement as a part of the language exists. SQL obviously permits that because it is meaningful for doing anything that you wish to do. There are correlation names. These may not make sense to you. Select distinct s, sh from student as s, where s dot s name is Ravi. This looks like a funny query. Imagine if I don't use s. Suppose I said select distinct sh from student where s name is equal to Ravi, the query will still be valid. This query is artificially written to illustrate that I can associate an arbitrarily chosen name with any table and use that name instead of the original table to qualify the attributes of that. So when I say from student, I could have said student dot s name. I could have said student dot sh that is standard. But if I say from student as s, what I'm telling the SQL engineers, look, I mean the student table. However, in my query wherever I use the word s, take it to be same as student for this query. So I'm correlating s with student. That means anywhere else I say s dot something, s dot whatever, it is same as student dot. Why would I do such a silly thing? I can use the main name itself. The purpose is not abbreviation. The purpose is something different. It is useful if the same table plays two different roles in a query. Later on we'll see an example in one tutorial where you have to do what is called self-joined. You have to join a table with itself. Now how do you join a table with itself? Can you say select range, range, then make sense? There you will say select range as a comma range as b. Mathematically, the SQL engine treats a and b to be two separate copies of the same range table and does whatever operations you say. We shall see later why such things are useful. But there is a provision. The word as is optional. So I can just say from student s, would it take these two as two different tables? No. If there are multiple tables, they have to be separated by a comma. Go back to the previous slide and see this. If there are multiple tables, they have to be separated by a comma. But if I say from r1, a, r2, b, that means a is a correlation name for r1, b is a correlation name for r2, etc. Set functions. They are another favorite requirement of information system builders like us. We not only want to arrange data, we also want to calculate aggregate functions. We want to calculate averages. We want to calculate sum. We want to calculate maximum, minimum. All these functions are called aggregate functions and are provided for in a square. Count is one aggregate function. Count just counts the number of rows selected. Please note that SQL already gives you a set. And you would like to know how many elements are there in that set. 16, 15, 5000, 10,000. That will be given by count. Max mean average sum, etc. are the aggregate functions which are convention. All these functions are always evaluated over a set and they always result in a single value. So count will work on a set of whatever, 10,000 tuples, the value will be 10,000. Sum or max operates upon 5,000 tuples. What will be the result is the maximum value of individual elements of all those 5,000. Common sense, the way you use them is as follows. Count the number of students taking CS634. Select count star. This is the way you specify. Count star means all the elements which are selected, that is all the rows, which form the set of result. From student, age, where student dot s roll equal to rage dot s roll and c call is equal to CS634. So this will give you the number of students who are taking this. Number of hostels. Select count, distinct s h from student. Notice the use. I am selecting from student. I am doing actually a project. I want to count only the distinct hostels. The parameter for count is distinct s h. So I will get the number as 13 provided of course that at least one student from each hostel is there. That is obvious to you, right? Highest CPI is scored by any student. Select max s CPI from student. Obvious usage. So aggregate functions are very straightforward and simple concept. Mean is like max. Curiously you can use min and max in a single query. Find highest and lowest CPI. So you don't have to write two different queries. You can say select max s CPI comma min s CPI from student. You can actually also write comma AVG CPI or average CPI. Because in a single scan all of these things are done anyway. You can write a program to find maximum as well as minimum in a single scan of data. Same logic applies here. For illustration of some of the queries we define an additional table. We have defined student table, course table and regs table which is a association table, right? We now define a faculty table. Create table faculty. Faculty name work are 40, faculty age, numeric 2, faculty salary, numeric 7 comma 2 and department number, numeric 3. We artificially say that in our academic environment departments are not named but their number. 1 2 3 4 5 6 85 whatever. And we imagine that we will see that. Is there any problem, logical problem in this data definition to find something amiss, something not very good actually. If these are the attributes which is that age. What is wrong with age? Age is a derived attribute. Age is not a natural value. Age may be correct at the time when it is entered but one year later it will definitely be wrong. One year ago it would not have been this. Such derived values are normally not designed as a part of my table design. This is just an example but it illustrates the point. Anyway with this table now we will try to form with this and all other tables we will try to form some query. Here is another table department. Teachers belong to department. Department offers courses, students belong to department etc. In this particular example curiously our department has a department number as I mentioned earlier. Go back. Here is a department number. Department number is numeric we said. This is an attribute of table faculty. This is not primary key. Primary key of this table is F name. Here is another table whose primary key is obviously department number. One electrical, one arrow, two civill, three chemicals, whatever, whatever, whatever. Department name and department location. Imagine that all the departments, the hypothetical academic institution where all the departments are located on floors. There is a huge 40 floor building. So department number 34 may be located on floor 7. Department number 58 may also be located on floor 7. Some department is located on 41 floor etc. etc. And these are the departments in which faculty members stay and work because faculty members have an attribute called department. In this sense faculty members are related to the department. The department number appearing as an attribute in faculty is actually the key for this table. So in fact in a way this table is actually an association with a table called faculty. We have not created a separate association by removing the department number to become. We have put the department number in that attribute itself. Why? Because we are saying a faculty member can belong only to one department. So there is no many to many relationships. So we don't need that diamond for representing that anyway. So you might want to just note down these three attributes and some of these attributes because the next queries are based on that. So F name, F age, F salary, department number and department number, department name, department location as the three parameters. Now we discuss further. Some values in a specified column. Some of the salaries of the faculty select some in bracket F salary from faculty. So this aggregate function will calculate the sum of all salary. Suppose you want to calculate the average salary. You can say select AVG F salary followed by that. Please remember that if for some reason some teacher salary is null. That means that let's say the faculty members promotion is under discussion. So in either case the old salary is null the new salary. Actually bad for faculty member, but whatever. If it is null then what I'm trying to say is it should not be counter towards any averaging or summing. Find the average CPI of students registered for the course CS 634 from hostel 8. Select average SCPI from student the comma range where student dot s-roll is equal to range dot s-roll and C core is equal to CS 634 and S hostel equal to it. Notice that I need a join because the hostel number of the student is in the student table. The course code is in range table. The composite information I will get only if I do a natural join. This is the natural join condition. And additionally I want to have the course code to be CS 634 is part of this. I can also say select average SCPI from student natural join range where C code is this and S hostel equal to it. Notice that the join condition which I put in the where clause in the previous query go back to the previous thing where student dot s-roll equal to range dot s-roll. This condition as part of the general predicate here is subsumed within this query when we say natural join. That equality is automatically implied. We now come to an extremely important part called sub queries. A sub query is a query which is first executed and gives the result of that sub query to a main query. It's like a sub routine. You have a main program. You suddenly want to do something. You go to that sub program do it. Come back with that result. Sub program will always send back some result. So program sub program is like the query sub query. The sub queries are queries within a query. These are also called nested queries. A simple comparison predicate can involve a sub query. You can write sub query as a completely independent thing. So that means in a query is a select something from this where now in where you are writing one or more conditions. In any condition inside a condition any one of the two parts of value expression can be a sub query. That means sub query whatever values it returns one or more whatever we shall see what how to deal with more value that replaces that value expression and then comparison happens. Let's take some examples. So value expression operator sub query is the format of writing the sub query. We answer we try to answer a question. Find names of students from hostel 8 who have CPI that equals maximum CPI in the institute. What we want to find out names of students from hostel 8. Select s name from student where s hostel is equal to 8. Where s CPI equal to why can't we write s CPI equal to max s CPI because max s CPI is not a value expression. Max is a function which is permitted to be used only in the select statement. It is not a mathematical function in that sense. It's an aggregate function. So consequently max something can be evaluated only as a part of select something. So what I do now is I write here select max s CPI from student and put the whole thing in bracket. This becomes a sub query. The way SQL engine will handle this is it will look at this and recognize it's a sub query and it will execute as if it's a standalone query. So it will select max s CPI from student. It will get what one value in 9.78. Now it will replace that 9.78 here and then execute the larger query. So it will again go through the students where hostel is 8 and s CPI equal to 9.87 water. That's the purpose of sub query. Find the names and CPI of students registered for CS 634. How do we get this? We know that the range table contains the registration course code CS 634. We also know that the student table contains name and CPI. One way is to use the join. So select name comma s name comma s CPI from student comma range where student dot s roll equal to range dot s roll and s code equal to CS 634. Alternately because we have the notion of sub query we can do the following thing. First get roll numbers of all students registered for CS 634 from range. So don't do the join of two tables at all. I know that range has 30,000 registration but CS 634 has 85 students. So I first get out the list of these 85 and list of roll numbers of 85. I don't have names here. Now once all these list of roll numbers of 85 I have a set on this side. Somehow I want to be able to say that now go through all the student table if any roll number is equal to any one of these that is if any roll number from that student is in this set print the name and CPI. You get the point. How will I write this query? Select s name comma s CPI from student where s roll in select s roll from range where c code equal to CS 634. Let's look at the sub query first. What is the sub query doing? It selects roll number from range where c code is equal to CS 634. Observe that this will not give a single value. Not natural. In fact it may give null, no value or it may give 85 values depending upon how many students are registered for it. Please remember that this query is now giving you a set. Therefore if I use the conventional comparison operators equal to grade, less than, they don't make sense. Those comparison operators will always compare one value and one value. Here on the left hand side I have one value where s roll. I can't say equal to this. There is a special word by sql in sql called in. Where s roll in this? So how this query will run? First this query will be executed. A set of 85, 60, 75 whatever those roll numbers will be collected by sql and now it will go through the main query execution. It will look at the rows of the student and if that particular one roll number is in this set then the name and CPI will be selected. This is not easy to explain from a relation algebra perspective because in relation algebra we did not have comparison of a value with a set. There is a relational calculus and tuple calculus which we are not discussing which handles sets with the algebraic operation. So in is one of those. In operator is used instead of equal to when we have a set of values on the right hand side. In a nutshell then I can write some queries as part of an expression. The way you write it is inside a predicted your value expression comparison operator value expression normally instead you can say value expression comparison operator and subquery. If the subquery gives you a single value then you can use any one of those normal operators. If the subquery is going to give you a set then you have to use in and you can use lot in etc. So please listen to this carefully. I am first defining the project one. This obviously is given in the context of IIT Bombay so that you are all familiar with IIT Bombay. IIT Bombay runs several sponsored and consultancy projects on these projects. Project staff is appointed. This could be project engineers, project managers, administrative staff, whatever whatever. Many a times students are appointed on honorarium basis on the project. They are not called appointed they are called associate. They work for 3 months summer, winter, whatever. There are times when the student is passing out of the institute and is waiting for either joining a new job or taking up some doctoral thing or joining something else and says that I have finished my this thing so I am a degree holder but I would like to work as an employee only for 2 months, 3 months, whatever. Sometimes there is a project engineer working in project A that project engineer has worked for 2 years. Within 2 years the project engineer has learned something new and now aspires to be appointed on some more prestigious project B. The information about this project engineer was collected by the principal investigator of the project at the time of his appointment. Afterwards what this fellow has achieved is not known because there is no mechanism there is no system. See when you appoint people you ask for application. Physical applications come you scrutinize them and then you find out who are the eligible you call them for interview, you select. Once you select the dean are in this office will religiously keep that bio data in a file and forget about. 2 years later what has happened to that human resource is not known except to the human. But there are 50 other projects who are probably looking for people with 1 year experience, 2 years experience with this skill set and without knowing that there are 3 such individuals in the same institute and 2 students who qualify with those specifications. This principal investigator will spend 50,000 rupees in an advertisement. So we would like to build a system in which the evolving information about prospective and existing employees on projects is maintained. And we would like this information to be maintained in a distributed way meaning I don't put clerical staff etc. to key information update information. I let everyone update one's own information. Just like you do course registration. Earlier the registration forms would be filled they will be collected head will sign them academic office will enter them. Today you all enter registration data online. What happens if you try to enter some bunk home data is supposed to be a faculty advisor who checks it or something. So you can put some validation process. Now what we wish to do is to build a system and information system. We don't know how to build that by the way we have not discussed the analysis and design and other things we will do that later. But the purpose here is to build a system wherein hundreds of prospective and existing I would not say staff human resources should be able to describe themselves in forms of parameters which can be queried upon it. The idea is that if I am a principal investigator of a project and if I urgently need someone with this specification then I should be able to put the parameters of those specifications in a query in a screen. I want a fellow with 5 years experience of fellow who knows Java programming who has done this who has done this etc. Get me that. So this system will involve first of all the database schema design how many tables you will have. We will see how to do analysis and design. Then describing all the screens that you will put for querying. You will design all the reports that must be generated. When I make a query I want a printed report which says the following 6 candidates their location here, here, here are available. I might then want a system another program when I run this query it will automatically generate an email to all the 6 fellows saying father go on people with this attribute you seem to be so meet him 30 on this and please realize that in business information system it is not just the collection or extraction of information we are talking about but we are also talking about optimizing the workflow in any organization. Imagine that if you give me a software which will extract these names and I will get these 6 names. Now either I will have to type the email looking at all the email addresses from there or I will send physical notices to whichever hostels or wherever they are. Instead if the information system itself automatically generates for example every time you make a submission on moodle as a teacher I get an email saying this is my submission one of the reasons why I am going crazy with my mailbox is I am getting 250 emails every day so I don't know what to do with that and suddenly between 10 emails here and 13 emails there there will be an important mail from Ajay Chaudhary who is the CEO of HCL I just saw it I would have missed it otherwise or I would have tested him for assignment. Now so this is roughly the thing now what I want you to do is you have to choose only one of the two by the way and what you choose here may have no bearing on what the project you ultimately this is an exercise I am giving you on thinking about the requirement of an information system I have given you a broad parameter now you apply your mind you know the environment go talk to some project engineers or project staff members that you know if you just walk into TANSA you will see hundreds of them ok and or in your department or whatever and find out like what was the process what were the what were the parameter attributes they defined when they were called for interview so at least you can say that ok typically somebody wants to look at the following things so like here we said name, CPI etc. so name, address, telephone number, contact number qualifications etc etc etc. make a list and then we will form teams when the teams will compare these lists what will this give you this will give you an exhaustive list of all attributes that are likely to be important for this then we will design tables you can see where we are going now so I want you to do an exercise at this time you are not doing a business analysis you are not doing a system analysis but you are doing the problem formulation what is the problem you may think of variety of how will you collect the data so what kind of screens will be required what kind of reports will be required ok what kind of logistics should be arranged as I said automatic generation of e-mail what kind of security will be required what I suggest is write a generic thing and write something special that it occurs to you either in security or in workflow or in data collection mechanism or verification mechanism or from a principal investigator's point of view etc. right that's the task do not spend too much time on writing this but I would expect each one of you to spend about 1 hour thinking and 1 hour writing assuming that you can write about 2 to 3 pages type pages in 1 hour 1 hour of thinking and 1 hour writing is good enough the purpose is to think and for God's sake don't copy because it is meaningless a we will catch it immediately and b it doesn't serve any purpose the purpose is that you think yourself doesn't matter if instead of 4 pages you give only one page I am willing to by the way a single line submission he says sorry I did not get time to think or sorry I cannot think ok we will figure out what to do with you when your CPI comes but do it yourself is important ok second project now this is an e-learning tool which will permit a teacher to create synchronous interactive learning material using multimedia resources let us take this example itself what are you seeing on the screen you are seeing the slide now imagine you are not here you are in Jharsugula in Jharsugula you cannot see me here you can see me and the slide the Jharsugula fellow is able to see only the slide but what apparently what appears on the screen is what that fellow away is seeing and you can see the problem that person sitting here physically can see me can see this but person sitting in Jharsugula can only see either one of these ok second suppose I record all these lectures what will be recorded today actually what is going out onto the visa is getting recorded what is going out to visa either me or the slide and the intelligence of what is more relevant at any point is that fellow he decides what the person in Jharsugula sees not I have no control not Jharsugula fellow also he has no control if he decides that the fellow should see me he will see me although he wants to look at the slide and when he is taking down from the slide suddenly he sees me that is not good right second so ideally I would like to see both the slide and the teacher that is the correct simulation of a live experience that alone is not a live experience there is something else since I have digital technology suppose now as we are doing now there is actually the mixing that is happening but there is a camera which is actually recording all of me so the entire video shoot of the teacher for the entire duration is available separately the entire slate of slides is also available separately the entire set of slides and the teacher talking is available in a synchronous manner because the audio is getting recorded now imagine the following can I switch to this suppose this is the screen and the Jharsugula fellow is sitting in front of this screen what he sees is very interesting I want him to see so this is the screen in front of which a student at a remote place is sitting ideally what that student should see is on this large screen there should be a small window which is a video window constantly the entire video of the teacher is appearing here perpetually the audio is flowing along with the video the student in Jharsugula should perhaps see a slide don't you think this will be better than seeing either me or the slide now imagine such recording has been made so now I have multimedia resources multimedia means multiple media text voice audio I have a slide resource which is a text resource I have a video resource which is audio video resource and I have the synchronization between slide and video based on when I push the next slide this information is available let's say to my tm now I tell him I want you to do the following you have attended my lecture you have also the composite video so you know which slide comes when I am talking what go back take all the collection of slides separately take this audio video separately audio video has a timeline time t equal to 0 time 1 hour half an hour etc. now create an association between one slide and the corresponding video supposing this slide I start discussing 2 minutes 35 seconds after that at 3 minutes 10 seconds I change the slide now I say prepare an index 1 2 3 4 etc. each slide if I click on slide 4 directly when I start the session ordinarily I will start seeing from 0.1 but I want to go directly to slide 4 I will click on 4 not only the 4th slide should come here the video should advance to the point when I started discussing the 4th I want a further addition I want a modularization because ordinarily I won't even know what the hell was going on in slide number 48 or 34 so suppose I say that ordinarily there will be some logical grouping of slides in a lecture so let's say I am discussing a square I am discussing subqueries I am discussing application architecture 3 to 5 to 3 to 7 slides may form a group so I will say this is group 1 this is group 2 group 1 comprises slide 1 to 7 group 2 comprises a slide 8 to something when I click on group the first slide of that group should come and the corresponding video should come in short I want actually him to define a table which will say group number then it will say whatever text to be written here then it will say start time and say video index or some such thing whatever you can imagine basically I want to correlate would you not agree that this would be a very useful learning material for a student even after the class is over I want you to develop a system which will permit a teacher to construct this given the slide resources and video resources and the requirement that this is correlated at this point that's the task you can make it more functional by saying that I have ok now you can see I have let's say 50 courses so I want to prepare a database in which all 50 courses are identified and for each of the course I have 30 lectures for each of the lecture I have these modules what is the advantage it is possible that course number 5 given by professor Fatak or information system wishes to use a module of 5 slides which professor Sudarshan is using in course number 37 so I don't have to redo anything I pick up those slides but the voice and video is mine so I don't have to store the resources of these 5 slides separately then when I release things in open source I have a huge indexing capability tomorrow instead of CDs and DVDs containing let's say anywhere between 780 MB to 4 GB if I have 200 GB optical disk then I can put on to that 200 GB all the audio video resources all the database tables all the SQL queries and the entire software representing that information system give him that CD to insert and the CD also boots it has an operating system like Ubuntu and the student can look at anything that he wants why is this important you see we are trying to create an equivalent of a classroom please understand the limitation of the technologies technology might appear very jazzy to you TV cameras 2 3 cameras etc etc and ability to show this that that whatever do you know that in a classroom ordinary classroom without modern technology how many equivalent cameras are working if there are 50 students there are 50 cameras because each student has eyes second the focusing ability of individual human eye is same as the focusing ability of camera camera can zoom we saw how it destroyed our attempt to write something meaningful there camera can zoom in camera can zoom out suppose I am attending a course okay so one hour lecture I am attending is it not very obvious that it is impossible for me for that entire one hour to be focused either on teacher or on slides or even combine both suddenly sometimes when I am looking at a teacher I may start getting interested in the pen so my eyes will focus and I will immediately zoom the pen it may happen for 3 seconds okay there will be occasions when I will be looking at some colleague this is the wrong place in time but if there is nice greenery outside I will be looking outside even if basically my eyes are focused on the teacher I might be looking through him and each pair is controlled independently by independent mind there is no way you can simulate that if you really want to give an experience of a remote person the remote person should decide that now I want to look at the pen just as you can he can't please understand the difference what we are attempting to do is forming a basis of a very rudimentary system which will try to come as close as a experience of a real life thing 50 years later there would actually be 50 cameras in fact today also there are technology demonstrations where we call immersive visualization that means you immerse yourself into the environment and depending on what you want to see with what get zooming you can actually do that because there are multiple cameras multiple projection systems we are not talking about that here we are talking about creating an e-learning tool which will permit a teacher to create synchronous interactive learning material using multimedia resources in the way I described on this slide additionally if suppose Fatakh lecture for whatever is created but if it is to be taught let us say in Bangalore university where Bangalore university wants to emphasize sub-queries more than what Fatakh has done then that teacher in Bangalore should be able to sit in front of a webcam shoot himself not shoot like this but I mean capture his image or video permit him to have his own slides and permit him to integrate his slides and his videos by replacing some slides and some video from me from this course material which has been given then that course material can be said to have been adopted for Bangalore university don't you agree that just giving a recorded course material of IIT course is not as useful as permitting people to modify this course material as per the suitability of different universities and different colleges because the emphasis will be different example somebody won't like to make different this will not be possible in the world full of intellectual property I said nice slides ok you can see them but you can't touch them that is why open source and that is why the equivalent of a general public license which in creative common says share a lie you are permitted to modify but whatever you modify should be pulled back so that that is also available to somebody imagine now if on a cooperative basis interested teachers from 2000 colleges come together and contribute these modules the kind of digital contents that we can create could actually make students working as a club as in some other players to benefit so much that's the purpose and towards this end we are starting this is actually the kernel of a open source large project for that you might want to see site from the rice university this is called connection but connections built with an S connection is a site where different people can contribute different modules they are not necessarily audio visual they could be different modules for in text for different subject segments so there could be a module on networking module on compiler's module on this these are all contributed by different people if I am a member of that consultium I can say I am editing a book and there is a tool we sell ok this is the book which modules you want this this this this this then it permission me to edit some modification I can put some modification there and then it actually prepares a printable book which can be sent to a printer and you can publish a regular book which will say edited by processor db file type and modules contributed by this contributed by this contributed by this etc that's an excellent tool for text kind of module here we are talking of multi-mean in this case also think about it this is not something which is of a common knowledge or experience to you that's why I took slightly longer time to explain now this is our imagination of the requirement I would like you to think again for one hour imagining further what else can be done and I would like to collect all 65 responses and maybe we will get you know 10 new brilliant ideas at least 10 new ideas maybe two of them are brilliant you know they say that when you out of 100 ideas only one is brilliant 99 have to be thrown out that may be right but I don't know which one so I have to examine all 100 I have to generate all 100 I am expecting that distributed generation will happen so choose any one of these two projects and submit it on the Moodle a text description of whichever you say just write the project number write then the description and we will get this okay and come prepared with the formation of groups of three to four students each and select a group coordinator will figure out a method of permitting the groups to be registered the group coordinator must also submit a choice of either project one or project two which groups will combine to form a two team shall be decided by me because in real life you don't get to choose if you are working in tobacco services I said I cannot say that I am interested in this project but I will not work with him I can only say I am interested in this project and whosoever else is working that's part of information system developed okay stop now thank you