 Good morning. Thank you for coming early in the morning and gave up your sleep to listen to me. I hope you'll find it useful and ours. I have been doing research in temporal databases since 1985. That's when my first article appeared in ACM-SIGMAT management of data conference. In those days, temporality was an academic curiosity and it is really interesting and somewhat gratifying for me to see that temporality is moving in the mainstream of information technology. That's really an exciting development. Actually, my talk will be somewhat cursory without going into details. I'll touch on many topics from basic concepts to modeling temporal data, temporal operations, temporal cura languages, expressive power of temporal data models, integrity constraints, and designing temporal databases. Of course, that would be a half a day or a day long presentation. I'll briefly touch on them. Feel free to interrupt me at any time to ask your questions or to make your comments. And also, this is a very conceptual presentation unlike the other Postgres presentations. So I promise you that you'll not see a single line of Postgres code. Well, why do we need temporal databases? Here you see various cases that I have cut and pasted from an IBM website that introduces temporal databases. Like in an internal audit, you need to see the transactions in the last five years. Or pending laws requires that in a hospital, the patient treatment before another treatment started. So they all involve temporality. A client files a claim to insurance company. And the claim needs to be processed according to the policy specifications that were valid when the incident occurred. And that certainly requires temporality. And here are some other cases and I don't really need to go over them to convert you. I guess you all see the need. Here are some transactions that involve OLTP as well as OLAP. And of course, typically in a database environment, the database is geared to OLTP. But of course, OLAP functionality is also highly desirable. It is more desirable than in the past today. Well, what is temporal data? Actually, our intuition tells us that the data is something current. However, if we scratch it a little, the data really has a temporal dimension. The data has a temporal context. And nowadays, of course, big data is the hot topic. And what fuels big data is high volume, high velocity, and high variety. When we add temporal dimension to the data, then it becomes long data as suggested by Samuel Eddinson. So truly, long data is feasible today. We can store, retrieve, and massage it. What are the characteristics of time and data? Clearly, it is ubiquitous. It is everywhere. And in today's technology, we have the capacity to capture, store, and process it. And definitely, it is essential for every aspect of human decision making, regardless of whether it is economic or industrial activity. From prediction to analysis, strategy, accountability, etc. Well, how do I see time? Actually, you can visualize time in different ways. And I will propose you a simple visualization. Time starts at a relative origin, let's say zero. And it progresses as the clock ticks. And the current time is represented as now. We can use the calendars values as you see on the second line. Or we can use integers, zero, one, two, etc. Because the calendar times nicely maps to integers. And of course, time granularities, we are well worse than them. And the conversion from one time granularity to another time granularity is very well explored. Now, how we visualize time and data generation? Here is a simple example, as you see. At the top, you see the timeline. And to make things a little abstract, I don't know why I did it. Instead of using zeros and ones, I used time, instant zero, time, instant one. That doesn't really matter. And events take place as the clock ticks. As you see, event E1 took place at time point two. And event E3 took place at time instant five. Now events generate data, and we're able to capture that data. And capturing the data, we have to connect the data values with the relevant time. Here are three choices available. Our intuition tells us that the events occur at time instances, like the first one here at T2, event E1 took place. At T5, event two took place. So as an example, for instance, let's say an employee started working at 1.2014 in the toy department. Of course, this is very intuitive. However, for capturing the temporal reality, it is incomplete. And a better solution is using time intervals that are designated by a lower and an upper bound. And if you look at the example, the toy department, actually the employee in the toy department, worked from 1.2013 to 5.2014. And of course, this is a very compact and complete representation. Still another possibility is using temporal elements that was introduced by Shashi Gadia, a prominent researcher in temporal databases. And here, a temporal element, as you see at the bottom line, is nothing but a union of maximal intervals, designating the time of event. What are the data types that we are dealing in a temporal database? Actually, in reality, we can identify three types of data. The first one is stepwise constant. That is, an example is a salary. The salary value is valid for a certain period of time. And the changes are abrupt and stepwise. Still another type of data is like sales that are valid at a time instance. A third type is analog continuous data, like the voice. So in a temporal database, we have to deal with all of them. Now, when we consider temporal data, I will use the relational data model as a presentation medium. Typically, what you see at the front is a simple table, employee name and salary, right? And the current snapshot is within the circle as you see it. And behind the current snapshot, there are the past values. So as you see, there is a third dimension. So that is how the reality evolves. Actually, we typically see it as snapshots occurring one after another at the clock ticks. In any temporal data modeling, the challenge is how we represent this three-dimensional structure, whether it is an object-oriented database, a relational database, entity relationship, whatever data model you are using. So the time you see here is actually called valid time that shows the validity period of the values. There are other names for that, like business time, logical time. For instance, in IBM DB2, it's called business time. Even effective time, if I remember correctly, is used. Now, let's look at the evolution of the data over time. Now, 1114, this is the three-dimensional structure we have. Now, let's look at a transaction that happens at 6114. And got a promotion, the salary moves from 12k to 13k. And Tom got a promotion, the salary is from 15k to 16k. And as you see, they are the current values. And if I have drawn my diagram correctly, the previous values should be behind those values, right? But I was not that prudent in drawing the picture or the diagram. Now, let's continue with as the clock ticks. And now, there is a correction at 9114. The correction says N salary was not 14k, but it was 14k, but not 13k. And similarly, Tom's salary was 17k, but not 16k. And as you see at the current snapshot, it is shown. What we see actually in the database is a correct and complete history. Now, if you ask a question as of, let's say, 10114, about the salary of N and Tom, you'll get these correct values. But if you ask a question, what was my previous, that was 6, if you ask a question like 7114, the values would be the previous incorrect values. So what I am simply telling you is that in a temporal database, if there are two dimensions, at the upper hand you see the time, that is the validity time. And at the lower dimension, that is called the transaction time, when the values are recorded or registered in the database, you have the complete history with all the changes that allows retroactive changes, even post-active changes. So in a temporal database that has two dimensions, it is called a bi-temporal database, as we will see in the next few slides. Well, feel free to ask questions or if I am going too fast, let me know please. As I have mentioned, valid time shows the time when the values are effective and valid in reality. That's also called the business time. Transaction time denotes when the values are recorded in the database. And user defined time is any time that the user interprets. Actually these three times have been specified by Richard Snuggrass in early 1980s. And by using this time dimension, he was also able to identify four different type of databases. The first type does not support time and it's called a snapshot database. So time is handled in an ad hoc manner. The second type of database is called a historical database and that has a validity time dimension. So not only we store in the database the current snapshot, but the past snapshots. So that is the historical database. The third type of database is called a rollback database or a transaction time database. And nowadays it is very popular and those are the immutable databases. As transactions occur, they are recorded in the database. There is no change in the database. There is only insertion of new transactions and that is an immutable database. The finally fourth one is a temporal database that includes, I'm sorry, I shouldn't really call a temporal database. I use the term temporal loosely to mean any type of time support. What you see at the bottom is a bi-temporal database because it has two time dimensions, the validity time and the transaction time. And as you see, it is the most complete that gives you entire history with the trace of the changes, whatever those changes are. In a temporal database, I can categorize the types of queries into several groups. The first one is a snapshot query. That is the typical query as we know in the database environment. There is no reference to time. A temporal query actually is a sequence of snapshot queries, meaning that the query is evaluated at different time points. And the other type of query is well known, is a time travel. Meaning that we want to travel in time backward or forward and see the database accordingly. And for time travel to occur, you need what? Can I give you a little quiz? No. TARDIS would be a step in that direction. Well, if you really want to have time travel, you need to have validity time and transaction time. Two times. Without having two times, you cannot do time travel. Probably this can be the most valuable take home message of this presentation. Why do I this? Because if I go back, this is my bi-temporal database and this is my validity time. This is my transaction time. And do you see the evolution of the database over time with all changes made to the database? Well, the next... Please. Yes. The next day I go in and I change his salary and say it's good to the end of May. So if you track both times, then you can say today I thought his salary ended at the end of April. And tomorrow I say his salary ends at the end of May. Is that the idea? That's exactly the idea. And of course we observed the progression of the database as the clock ticks. And now we are at this point in time and considering all the activities that took place in the past, by time travel I'll be able to answer the question which was correct as it is known at any instance of time. That is the time travel. Of course temporal integrative constraints are much like temporal queries. The integrative constraints are enforced at different time instances. Now how can I model temporal data that my three-dimensional cube in the relational data model? Typically a relation is made up of columns. Then here I have name and salary and I have a time for salary and I have a department attribute and a time for department. Clearly intuitively it should work but this is not a good solution, right? It creates excessive data redundancy and also querying the data would not be easy. We can improve it a little. We can combine the time of salary and department into one single column, right? Now this is better than the previous one but still it is not good unless the salary and department change at the same time. If they change at the same time this would work but if they do not it's not a good solution either. So what is my solution? There are two questions I need to answer. The first one is where do I attach the time? Do I attach it to tuples in relation or do I attach it to attributes? Actually this is a long-standing controversy in temporal databases ever since 1980s. There are group of people like Richard Snodgrass who are in the category of tuple time stamping and there are people like myself, Shashi Gadia, who are in the camp of attribute time stamping attaching time to attributes. The next question is what type of relations we use? Do we use flat relations? Do we use non-first normal form relations? Clearly if you attach time stamps to tuples you end up using flat relations, first normal form and if you use attribute time stamping you may probably end up in non-first normal form relations. How are we doing? Do I have... Good. So I guess since I gave you the opportunity to ask you questions if I use all of the time you wouldn't complain. Now what are the possible base relations that I can use? Actually let me go fast forward and then come back again. What I have done, actually I have developed a taxonomy of various types of databases, temporal databases by using attribute time stamping and tuple time stamping and whether we glue the time stamps to attribute values or not and whether we use flat or non-flat relations and some of the relations here are very interesting and I will quickly go over them as possible base relations. This is a possible base relation. You're introducing a tuple at every time instant. Of course theoretically it is relevant but it is not practical. The other alternative is adding two columns to a table from and to. For instance in this table employee name and salary and you have two columns for the time interval and this is commonly used in tuple time stamping. If you look at Richard Snodgrass' book temporal SQL that's the data model used there. Still another possibility is one step ahead is adding time as a range type, as a composite data type and that is available in Postgres and this is a much better alternative than the previous one. Now still other candidates solutions as you see the time is attached to salary and the top one and the bottom one are essentially same. You can disregard the bottom one. At the bottom one the interval is represented as a temporal set as just a set of time instance and it's not really a viable alternative but the top one is a good candidate. If you move one step more you can put the salary history as a set and you end up in a non-first normal form relation and that I believe is the best solution for the temporal database. I'd better not talk about this and you have seen the taxonomy now it makes sense, right? Here you see a properly designed application. There are employees, the departments for which the employees are working and the salary of the employee. Since the department and the salary do not change at the same time you add the time to each one of these tables. So that would be the solution in tuple time stamping. In attribute time stamping I would put all these tables into one single non-first normal form table and that would be my temporal database. By the way in all these examples I only have included the validated time or you can look at it as the transaction time and if I want to make a bitemporal database I need to add two more columns, right? For the transaction time that I did not do. So a properly designed temporal database in using tuple time stamping this one is a good example. Any questions so far? Go ahead please. It's a huge conversation and actually probably the slides at the end of my presentation may shed light on it as well. Now the temporal integrative constraints Of course the integrative constraint may be valid at one state that's single state integrative constraint or the integrative constraint may happen or may be applicable on many states and there is synchronous multi-state integrative constraints meaning the same integrative constraint applies at different time instances. For instance if you say that the salary should be larger than $100,000 so that applies today, yesterday, the day before etc. at each state of the database. On the other hand an asynchronous multi-state integrative constraint applies several states of the database concurrently. A good example if you say that the salary of an employee never decreases so that means current salary should be larger than the previous salary. That's the integrative constraint, multi-state. An important issue for an object we keep data about it in the database has life that I represent life of the object. Then the lifespan of the object that is captured within the database is a subset of object's life. So if you look at the integrative constraint the lifespan of an object, an employee that is stored in the database should be a subset of that object's entire lifespan, entire life. And that is represented in this table. When you look at the referential integrity of course the referential integrity has to be applied synchronously over multiple states. And in fact when you look at the department that is the referencing column and see the time of the department that should be a subset of the time in the department's relation in the department table. And actually that is nothing but the inclusion constraint that Jeff Davis mentioned the other day. What are the requirements for temporal data models? Now when we look at a temporal database we expect capabilities from the temporal database. Here I will list some requirements that I think are important for a temporal database. Actually I have listed them in an article that appeared in 1997, long ago in IEEE transactions on knowledge and data engineering. I had a paper on the expressive power of temporal courier languages and these requirements are from that paper. The first one says that modeling and couriering the data at any time instance. So if DB stands for our database so I should be able to model and query that database at time point 0, time point 1, time point 10, whatever it is. The second requirement says I should be able to model and query the temporal database over a set of time instance. And these are called also temporal queries. So I have my database dbt at time point t, db at time point t plus 1, db at time point t plus 2, etc. So I should be able to query and certainly model as well as query the database at these time points. It makes sense, right? And naturally if we have the first requirement we have to generalize it into the second requirement. Now the third one is modeling and couriering the database at two different time instance. So let's consider, okay, see I changed my slide in the upper two items but I forgot to change it here. This should be db, right? So that's the typo. So when you consider the database at two different times time t and t prime and those should be different that I should be able to query the database. The example I gave about the integrity constraint, the salary never decreases. Actually that is comparing the database at two different times. The time, current time, and the previous time, right? The salary should be different and the current salary should be larger. Now the next requirement is time travel. So a temporal database should be able to support time travel. The time travel involves the current view of the data, right? And it also involves rolling the database back to any time in the past and viewing the database as it is valid as of that time instance. Go ahead please. It's application dependent. You can go as far back as Big Bang if you want. Truly the model is very general and can support it. A temporal core language should return the same type of objects. Well, this may sound a little abstract and theoretical why relational data model is so popular because when SQL is applied on a relational database the result is another relation, another table. So that's why the language's relational algebra, relational calculus, they are closed. They work on the same type of objects and they create the same type of objects. And SQL violates that a little by not eliminating the duplicates but we can live with it, right? And another thing is actually whether the tuples are homogeneous or non-homogeneous. It occurs both in tuple time-stamping and attribute time-stamping. And this concept of homogeneity was introduced by Shashi Gadia and you'll encounter it in a temporal database as you do your operations. The idea behind homogeneity is if you have several temporal values in a table at different columns, whether those different columns can have different life spans or they have to be defined on the same time period, life span. If they are defined on the same life span then it is homogenous. If not, then it is non-homogeneous and it brings its issues together in querying a temporal database. Ten minutes, I'm almost done actually. So we have time for your questions as well. Another requirement is regrouping the data. We should be able to regroup the data according to a different criteria. Let me give you a little example. Consider a table, a department column and a manager column and assume also that we keep the history. So a department has all its managers, right? Now can I restructure this table into a new one where for each manager I can include the department history? That may be a desirable operation. Some use cases may require it. So would your temporal database allow you to do that? How would you do it? And that brings us to issues, unique representation of the relations. That means keeping the temporal data succinctly without any duplication. And the second one is weak relations. Consider a table. Let's say employee table, employee salary. There is employee's social security number, the salary, and then firm and two columns, right? Let's say employee's salary is from 1 to 5 is 50k. Now I can represent this as one single row or I can represent it as several rows. Employee's salary is 50k from 1 to 3 and 50k from 3 to 5. Clearly the first representation is better. It's more compact. But the second representation may occur including the database and that's considered a weak relation. The next one is the applicability. I'm sorry, go ahead please. Topple time stamping is simpler to implement. Yes. Yes, the other one requires more complexity. Absolutely. And of course in a temporal database since we're dealing with time, set theoretic operations and comparisons should be available. And the well-known example for this is Ellen's predicates considering two intervals whether they overlap. One is a subset of the other and one is before the other, etc. They all involve set operations. And finally, supporting multi-valued attributes. All along up to this point I talked about single-valued attributes and we want a temporal database to support multi-valued attributes as well. Well, along with the weak relations there is an issue of coalescing that I don't really want to talk about but on the left-hand side you see a compact unique representation. On the right-hand side, the middle one has weak tuples. And of course you can go from weak tuples to unique representation but you face the question of whether you manufacture in the process information or not that is not available in the database. But I shouldn't really bore you with this. If I want to talk about briefly about SQL 2011 standard that includes temporal support and that was heavily influenced by IBM and IBM included the 2011 standard in its DB temporal support product. Now, what is proposed is a conceptual range type. Here how it is used. A table employee is being created. Employee number integer clearly. Employees start date, employee end date two columns for the beginning and end of the interval, yes. And employee department integer. In this table we are representing the department of the employee. And period four, e period e start and e end. Period four actually is a conceptual designation that shows that e start and e end shows a period. That is simulating the range type of postgres in this standard. And IBM claims that they elected this alternative just to support backward compatibility. To a certain extent it makes sense but of course it doesn't really bring the full potential of the range type. And as I have said earlier there is a system time that is the transaction time and there is the application time business time or valid time in the SQL 2011 standard. And here you see the temporal primary key and forwarding key that they can be supported with these two types. And also the ellen's predicates contains overlaps etc. They are available to work on the period data types. Now what IBM does in DB2 actually is an implementation of 2011 standard and that's available to the temporal extension of DB2 on ZEOS. Now there is support for system time as of system time from 2 and system time between actually 2 minutes and I think this is the last one. Remember there is transaction time the system time corresponds to transaction time and since there is system time and there is business time that definitely supports time travel. What I don't like is actually the separation between system time and the business time is not clear cut. It's somewhat confused in my mind in IBM's temporal extension. And of course IBM implements a temporal table by two tables a current version and then a history version and in fact TARDIS Manus Agander implemented in a certain extent in his presentation yesterday. And finally what did I do? Okay that was out of sequence this is what I want to say about at the end of my presentation about Postgres. Now Postgres has the range type that's very valuable. Temporal keys can be implemented temporal referential integrity can be implemented and of course time travel needs to be added to Postgres. That's the challenge I had. I'd like to stop at this point for your questions and we have one minute Magnus or we are out of time. Out of time. Alright we can continue the conversation after if you prefer or you can ask a few quick questions if you want. Yes. What's the primary key? Well the primary key each table will have its own primary key and of course as part of the primary key you have to include part of the part of the bi-temporal and then you have to do the join. The join would be massive and most probably you will define the join operation when the business time or the valid time overlaps if they don't overlap probably it wouldn't those two tuples wouldn't qualify for join. So in a join typical join we don't have this problem but when we move into the temporal dimension since we have the time time plays into the join and you can define the join for the intersection of the time for the union of the time for the difference of the time depending upon the meaning of the query. It does make sense. So would signify your business validity rules by using range types but it has some sort of attribute on the column that is temporal based which is a huge project but it would probably be the most pure way to build this. That's true. That's true. Well, thank you very much.