 Δεν υπάρχει τίποτα στις πρόεδρος να βρήξουμε μοδελότητα στον φοβόριο του. Λοιπόν, πιστεύω, η πρώτη πρόεδρος να προσπαθώ είναι, τι είναι η λογική και η δημιουργία των δεύτερων πρόεδρος να βρήξουμε μοδελότητα. Και νομίζω ότι η καλύτερη πρόεδρος που μπορώ να σας δείξω είναι να πω για μου κόλικο πρόεδρο, Βίκτορ Βιάννου, που, σε ένας πρόεδρος του 1997, πήραν ότι, σε ένας πρόεδρος καλύτερης στις πρόεδρος, η δημιουργία της δεκκονομίας προσπαθεί ένα αδερφόδο εξαμελή. Η δεκκονομία δημιουργεί έναν κονκρίτιο κέντερος, για την δημογία της δεκονομίας. Οπότε το που θέλω να κάνω σε αυτά τα δύο εξαμελές στις πρόεδρος, είναι να δώσει μορφή, γιατί η δημιουργία είναι καλύτερη. και δεν θα υπάρχει τίποτα νέα για τα μεγαλύτερη κομμάτι, όχι όχι όχι όχι όταν έχουμε τη ώρα. Για τα τέτοια 40 χρόνια, υπάρχει μια πολύ εξαιρετική και πολύ προδοκτική αντιμετωπιστή, μεταξύ λόγικα και δεύτερα. Λόγικα προβάζει ένα εξαιρετικό σχέδιο και ένα τελευταίο τελευταίο στους και μέθοδες για να αντιμετωπίσουν τα χρήματα που χρήφουν τη δεύτερη κομμάτι. Λοιπόν, για εμένα, η αντιμετωπιστή λόγικα και δεύτερα πάντα δυο τελευταίες πρόσματα. Είναι ένα πρώτο εξαμμάτι της λόγικας στην κομμάτι. Αλλά, πιστεύω, θα πιστεύουμε πιο από τα εξαιρετικά. Αυτό είναι ένα εξαμπρόδεση, ένα πολύ εξαμπρόδεση της λόγικας που φτιαξε από το κομμάτι. Αυτό που θέλω να κάνω είναι ότι η πρώτα είναι πραγματικά τεξ-βουκ-μαντείρια, μόνο μόνο τεξ-βουκ-μαντείρια που δεν μπορείς να έχεις δημιουργήσει πριν. Θέλω να μιλήσω για τη δημιουργία και τη δημιουργία. Αυτό θα μιλήσω για τη δημιουργία τεξ-βουκ-μαντείρια από 40 χρόνια. Λοιπόν, θέλω να μιλήσω σε ένας τεξ-βουκ-μαντείρια very special fragment of first order logic called conjunctive queries and study their properties and the connections with homomorphisms. I understand it will be also a theme in Ben Rosman's presentation tomorrow. And from there, we've talked a lot about the limitations of first order logic. We will see how these limitations are overcome in the context of database theory by way of the language called Datalog, which is a front-benefit fixed point logic. Time permitting tomorrow, I hope we'll get into some more recent applications of this interaction of logic and databases, namely foundations and applications of schema mapping. What do these topics have together? The glue that holds them together and the unifying theme is the interplay between databases, logic and computational complexity. How did it all get started? We owe it to this man, Edgar Codd, or Ted Codd as he was known to his colleagues and friends. And in some sense the history of databases is a history of scientific and technological revolution. What Codd did was to start the scientific revolution 40 years ago at the IBM Sam José Lab, which is now called the IBM Almaden Research Center. He did two things. One was to introduce the relational data model and at the same time introduce two languages for asking queries against databases, relational calculus and relational algebra. That was the scientific revolution. Very quickly, within the next decade, we had the technological revolution with the development of system ARAT IBM, Ingress at Berkeley and very soon the Oracle Corporation getting to the picture with the product and IBM following with DB2. And the rest is history today. Relational database technology is a 17-18 billion dollar industry a year. So let me briefly remind you of what Ted Codd did. He formalized the relational data model by saying that relations, namely subsets of cartesian products, are a good formal object for representing data. And the idea he had in mind of course was that we can think of a table as a way of storing records and a table formally is really a subset of a cartesian product of sets, but that is to say a relation in the sense that we've been seeing here today. And then he introduced the notions of relational schema and relational database schema and this is nothing else but what today it was called the vocabulary in the first talks in the morning. In other words, we have relation symbols. They have specified areas but the only difference here is that we give names to the various positions of the relation symbol and we call them attributes. So we can think of a relation schema as being a set of attributes or a symbol with a fixed identity. And therefore this is a template, a blueprint that represents relations of that particular identity but also with names of the attributes. And then an instance of a relation schema is simply a relation that conforms to the schema in an actual database management system you have to have some matching data types but I will suppress this for now. And then a relation database schema is a collection of such relational schemas and the database or a database instance is simply a collection of relations that conform with the schemas that we have. So you may ask now what is the difference between what we've been seeing all morning and databases. I think I can summarize the difference in this slide. A relational structure as we saw before is an object that has a universe that we have made explicit in the bunch of relations. A database is basically a relational structure in which the universe has not been made explicit. We only have the relations. There is an important difference which we will very soon see is going to cause us some problems. But Code had the idea that these are dynamic objects. New elements may come into the picture and populate the relation so the universe may change. So he only made explicit variations not the universe. So that's the only difference in some sense between relational structures and databases as we saw them today. So as I said a few slides ago Code introduced two languages for asking queries against databases. The first is a procedural language and procedural here means we tell the sequence of operations. We specify sequence of operations by which, by the execution of which will give us the answer to the question we ask against the database. The other one was declarative. In other words we use some high level language in this case first order logic to specify what we want to retrieve as opposed to how to retrieve it. And Code proved a theorem that in some sense relational algebra and relational calculus have the same expressive power. This is not exactly true and I want to explain in what sense it's not exactly true and in what sense it is true. So that's what I'd like to formalize. It's really textbook material we just need to make it precise. Let me remind you what Code did by way of relational algebra. Basically relational algebra is the set of expressions that you obtain by starting with a bunch of relations in your schema and your vocabulary. Whatever you want to call it and closing them under these five operations. The first three operations are perfectly general from discrete mathematics, union, difference that set theoretic difference. It just insisted that these are relations of the same identity, Cartesian product. And then he had two special operations that are special because they are meaningful for relations. The first was projection and the second was selection. So what is projection? Intuitively projection is the operation by which you want to suppress or hide some of the columns in your table. So for instance, if we have a table with banking information accounts and we want to suppress the information about the account number and the balance, then we basically get, if we only would keep the customer name and the brand's name, that's what we get. Formally speaking, the syntax of the projection is this pi for projection i1im, where i1im are distinct integers from 1 up to k. And the semantics is it gives you back the set of all m-taples such that there is a completion of this m-taple to a k-taple coming from your original relation. So this not only suppresses some columns but also allows for rearrangement of the columns, changing the order. So this is an operation on columns of the table. What had also the idea that we need, and rightly so, we need another operation that filters some rows, throws out some rows. And that's the selection operator. So the selection again is a sequence of operations, one for every condition. The condition is a Boolean test to which we subject every row. If it passes the test, we keep it in the result, otherwise we throw it out. And then the question is, what are we allowed in the condition? Well, in a language like SQL, the conditions are very elaborate, but in the case of code, the conditions were very simple. He allowed the quality equal, not equal, and if you have a total order in the domain of the values of some attributes, then you allow arithmetic comparisons bigger than, less than or equal than, and then you take the Boolean closure of these expressions. So you can talk about people whose balance in the checking account is more than 10,000, or who live in this locality and the balance is less than 9,000, and so on and so forth. By the way, please feel to interrupt me as we go along. We've been very attentive but also very quiet audience. So now here is the formal syntax that relational algebra is a string obtained from the basic relations by applying these operations. So that's the first language that code gave. Notice that each of these operations is very simple, but the strength of these operations comes when you combine them together. And then code went on in his second paper to give some non-trivial examples of new operations that you could derive from these basic operations, and perhaps the most basic and important operation is the natural join. Is everyone familiar here with natural join? Yes? Who is not? Alright, so let's quickly explain what a natural join is. Here is a motivating example. Let's say we have in a university, the rest of us database that has information about faculty who teach a course in a particular term, and information about enrollment, students, course and term, right? Because that's what happened. The department announces the teaching schedule, the students who run in courses, right? And then, of course, we get in the beginning of the term a list that has the name of the students enrolled in the course. So how is this done? Well, we want to obtain the taught by student, course, term and faculty name, and it will turn out that this is the natural join, which is given by this bowtie symbol between these two relations. So formally, the definition of the natural join is the following. Suppose you have two relations schemas, R and S, and suppose that they have some attributes in common. Remember, these depositions are named, right? So suppose they share some names as we saw before, the two relations we are sharing names. Then the projection, excuse me, the natural join is a projection of a selection of a Cartesian product. Okay, so here is how it's carried out. You start with a Cartesian product of two relations, and then you keep the tuples in this Cartesian product that have the property that for every common attribute, the value in the first relation is the same as the value in the second relation. You get a subset of the Cartesian product where you have a lot of duplications. For this duplicate, you keep one of the two, and that's the natural join. So indeed, if you do in the previous exam, you get the taught by from pizzas and thin rolls, and there is, of course, a very naive algorithm that basically creates the Cartesian product for every tuple you see, whether or not they match. Notice that in the case where the two relations have no attributes in common, then the natural join becomes the Cartesian product, right? So in principle, it's as expensive to compute as the Cartesian product. Here is a second example. Also goes back to code, and this is a more complicated operation called the quotient or the division. So what is the quotient or the division? You have two relations, r and s, but now you're going to assume that the r-ity of r is bigger than the r-ity of s. And then the quotient is a relation whose r-ity is the difference of the r-ities, r-minus s. And it consists of all the tuples of length r-minus s such that no matter what tuple you get from s, when you append the two tuples, you end up enough. This sounds very strange, but it does something very useful. Let's look at an exam to appreciate it. Victor Viano happens to be a great instructor at UC San Diego, and you want to find the students who have taken every course that Victor Viano has taught. How will you compute this if you only had relational algebra? Well, it's very simple. From teacher's faculty name and course, we can obtain the courses taught by Victor Viano, right? That's easy. That's a projection of the selection teachers, where in the selection condition we use faculty name equal Victor Viano. Now we have a table with only one column that gives the course of Victor Viano. Now we want the students who have taken every course that Victor Viano has taken, so if you follow the definition, this is nothing else, but the quotient of enrolls divided by the courses taken by Victor Viano. Now when you look at this, the definition of the quotient is not obvious right away, that this is expressible in relational algebra. Yet it is, and that's a non-trivial exercise for undergraduate students. Let me illustrate how this is done by doing it concretely for a relation of I-85 and a relation of I-82. Therefor the quotient is going to have I-83, and the idea here is that we got to use the difference operator. Remember before in the example of the natural join, we saw we've used projection, we've used selection, we've used Cartesian product, the union you can imagine situations that we can use the union, but here is a way to use the difference, and it goes like this. The quotient basically is a subset of the projection of 1, 2, 3, of R, right? Because it consists of all the triples such that no matter what pair you append from S, you end up in R, right? So what we have to do is intuitively speaking take the projection 1, 2, 3, of R and throw away the tuples in the projection that don't make it into R minus S. In other words, the projection is our candidates and we want to throw all the ones that don't make it, all the failed candidates, so to speak. Now let's consider this relational algebra expression. This is the Cartesian product of 1, 2, pi 1, 2, 3 with S. Take away R. These are really the things that don't make it to R minus S, so therefore to get the quotient what we need to do is take the difference again. So this we have a nested use of the difference, and this way we get the quotient as an expression in relational algebra. So as I said, this goes back to Code. He illustrated that you can do interesting things with these operations. Now there is a sort of basic language design question. Code came up with these five operations, and I showed you how you can express interesting other operations. Do you need all of these operations or not? In other words, was there any redundancy in his language? Code was a very precise man, I never had the honor to meet him, but that's what I hear from the people that knew him, and he was very careful, you actually can prove that none of these five operators can be expressed in terms of the other four. So this is the theorem. Each of the five relational algebra operations is independent of the other four. You cannot find algebra expression that involves the four of them and gives you the fifth one. How do you prove something like this? Well, this is like a lower bound in expressive power. The idea here is to find the property for each operation that the operation has, but no expression built from the other four has it. So let me ask, how would you do the Cartesian product? It's very easy. What does the Cartesian product do to the arities? It increases the arities, right? The other four operations, whether it's projection or union or difference, or selection, well, projection lowers it, union keeps it the same, difference keeps it the same, selection keeps it the same, right? So you find one property that the four have and the fifth doesn't have, and I told you what to do with the Cartesian product, it increases the arity, the projection decreases the arity. What will you do with the difference? Well, it goes back to some of the discussions we had in the morning here. The other four operations are monotone. If you put more in the arguments, you don't lose any tuples. The difference in general has the property that if you put more in the second argument of the difference, our minus says, you may decrease the outcome, right? So this monotonicity property is what tells them apart. It's trickier to do it for the union. It's an interesting exercise. So the bottom line is that Code chose his five operations very, very carefully, right? There was no redundancy. There is no fact. It's a very lean language. Okay, that's all I want to say about the algebra. This had direct effect. What's that? Yes, yes, but it's not as easy to see. That's a non-trivial exercise. No, no, it's just combinatorial. It's fine, the right combinatorial property. I'm sure there's a proof by a lot, but it's just a straight combinatorial argument. It is true because the theorem is true as I said before. It's part of the five basic operations, right? Okay, and you can see here the direct influence on the design of SQL and the semantics of SQL. SQL, the main construct, is selected from where? And unfortunately, select corresponds to projection. Where corresponds to selection and from the Cartesian product, right? So you would, to express the projection of the selection of the Cartesian product in SQL, you would write a select. This, R1, A1, RAM, AM, from these relations from this Cartesian product where this condition is satisfied. So direct influence of relational algebra on the design of SQL. Now, in addition, code introduced relational calculus, and relational calculus is a declarative language which is entirely based on first-order logic. There are two versions of calculus. Actually, code introduced the tuple calculus in which first-order citizens variables were ranging over tuples of a fixed rtk. So this is like making arrays first-class citizens as opposed to making elements of arrays. Here, in logic, we use more the domain calculus where the elements of the tuples are the variables. We will focus on domain calculus. We will focus on domain calculus and then we will discuss the representation between the two. So I want to discuss a little bit now code's technical result which was that calculus and algebra have the same expressive power and assess the calculus. That's the term that he used, but it really stands for first-order logic and indeed, this is the syntax. As we saw it early in the morning we saw the syntax of predicate first-order logic and I'm not going to repeat it again. Of course, I'll skip the semantics. So code's idea was that now you can write expressions, formulas of first-order logic with three variables, run them on a database and get back a set of k tuples which consist of all the k tuples in your database and satisfy the formula. So that goes back to the semantics of first-order logic that we saw in the morning. So as an example, if you have, let's say, an edge relation in a graph, let's say you have connections, each stands from two non-stop flights, this gives you the pair of nodes that are connected by a path of length 2 or the destinations that you can pair, say, the destinations that you can reach with one stop over, in the case of an L-line. And as an illustration of this we saw how hard we had to work to get the quotient, right? This was a nested application of the difference together with projection, selection, cartesian products. We did all this work really for something very simple because the quotient is easily expressible using the universal quantifier, right? This is the direct translation of the definition of the quotient to first-order logic. In this case, the set of all pairs, so that for every triple, if X3, X4, X5 belong to S, then the quintuple belongs to R, right? So we get immediately the translations. Much simpler than the relation algebra expression. This is making a case as to a white superior to have a nice declarative language, as opposed to having a procedural language. So Code's theorem informally says that algebra and calculus have the same expressive power, meaning whatever query you can express one, you can express the other. As I said, this is not entirely accurate and I want to explain in the sense in which it's not accurate and then formulate a rigorous correct version of this result and sketch the proof. Going from algebra to calculus is very straightforward. Okay, this is a translation interpretation. We're adding an interpreter of going from algebra to calculus. So for every relation algebra expression, there is an equivalent relational calculus expression. There is only one way to do this and this is induction on the context-free grammar that gives you the relational algebra. The first three parts are very straightforward. Of course, the union corresponds to the disjunction. You can hardly see it but there's a disjunction here. The difference is phi 1 and not phi 2. The Cartesian product becomes the conjunction with different sets of variables. What happens to the projection? The projection is an existential quantification. That's what you would have expected. The selection, well, is basically you take your condition and you translate appropriately in first order logic and then you take the conjunction of the condition theta together with phi, where instead of theta you're using a translation of theta into the formula of calculus. This is very straightforward. So this is a very straightforward translation. It just brings out the flavor of projection as existential quantification. That's really all there is to it. And the selection as a filter where you add another condition on the formula. What about the convention? Well, it is simply not true the way we have set up our definitions so far. It is simply not true that every relational calculus expression has an equivalent relational algebra expression. Let's see why. This very simple negation of an atomic formula. We have a problem here and the problem is that we haven't made explicit our universe. And therefore here we would like to take the complement or the difference with the universe but we don't have the universe around. We have not made it explicit. So this is really the problem. Remember in the very beginning I said there is a little difference between databases and relational structures but the difference is not making the universe explicit. Now we pay a price for it. The price is we lose this translation. But that's not the only, this is a blatant way. There are other ways. We have a department in a university. Departments have chairs and we keep track of the administration, keep tracks of department and the name of the chairs. We write something like this. X, Y there is a Z such that X is the chair of department X and Y is different than Z. But what is Y here? We have a problem. And there are other things like that. We look for the students who enroll in every course in every term. You can also see that there is no way, I mean this requires proof but it's not hard to show that this is, there is no equivalent relational algebra expression for this. One either. So it is not just a negation. There are other reasons that make this translation fail. So let's take a closer look at this. So as I said, as I hinted, the previous three relational calculus expressions are not translatable to algebra because they give different answers depending on the domain that we will choose to interpret our variables. Again, the price we pay for not making the universe explicit. So let's look at the simplest of these three examples. If you see our variables range over domain D, then of course we can say that the semantics of this expression is d to the k minus r, right? But as we change d, we're going to get different values. Intuitively, this means that the relational calculus expression, not r, is not domain independent. It depends on the domain over which we interpret the variables. So now we want to formalize this notion of domain independence. On the other hand, something like this, the difference that code used in relational algebra was domain independent. Because when you try to give a sign meaning to this expression already, you know that s, this tuple, must be a tuple in s, right? So even if you consider a bigger domain, it doesn't make a difference. You are still ending up taking the difference between s and t. So we want to capture this difference with a precise definition, this distinction with a precise definition. And this brings the notion, this important notion in databases called the active domain. The active domain comes in two parts, the active domain of a formula and the active domain of a database. The active domain of a formula is simply the set of all constants that may appear in the formula. So this is a very simple thing. You look at the formula, if it mentions some constants, you put them in the active domain. The active domain of a relational database is really the important thing, is the set of all values that occur in the relations in the database. So you look at the database, think of it as a set of tables, you look at the individual values, that's the active domain. Now, suppose we have a formula of calculus and now we want to give, remember we haven't made, we have a database, not a relational structure, so we have no universe around. We want to give rigorous semantics to this. In some sense the problem is we want to give it a size. So if I have a domain, a universe, which is big enough to make the evaluation meaningful and this means it contains the active domain of the formula and the active domain of the database. Then phi sub d of i is the result of evaluating the formula over this domain and i. That is, all the variables and quantifiers have to be arranged over d. It's not meaningful to go below the active domain because then you have to take into account the values that are in the database. And of course the relation symbols are interpreted by the relation symbols in i. If we take d to be as small as it's meaningfully possible, namely if d happens to be the union of the active domain of phi with the active domain of the database, then we write the result of evaluating phi on d and i as phi active domain of phi. And now we can say that a relational calculus formula is domain-independent. If no matter what domain you evaluate the formula as long as it's meaningful enough, big enough to be meaningful, what you get is the same as evaluating the formula In other words, all it matters is the evaluation of the active domain. The formula is very stable. Let's look at some examples. This is not domain-independent. We saw that before. We changed d. We get different answers. We get d to the k minus r. This is not domain-independent. Something like this is domain-independent. There exists y, r, x, y. That's easy to see as domain-independent. On the other hand, going back to for all y, r, x, y, it's not domain-independent. Think about it, because you may have a very simple database in which all you have is r11. All you have is r11. And then, in this case, the active domain this is my i. My i consists just of this. The active domain is simply 1. So, if I evaluate this expression if I look at all the x's such that for all y, r, x, y then, of course, I get only 1. Right? But now suppose I change my domain. Suppose I take my domain as 1, 2. Then if I evaluate this is let's say this is phi. So I'm evaluating here phi on the active domain. Now if I evaluate phi on this domain, what do I get? I get the empty set, right? I get the empty set. So I've changed the domain and I get a different value because it would insist that also for y equal to I must have r12 but I haven't changed my database. I've only changed my domain. So this is not domain-independent. Now with this notion we can state precisely a code's theorem. So code's theorem says that if you have a query the following are equivalent. One, there is a relation algebra expression that gives you the value of the query on every database. Two, you can find a domain independent relational calculus formula. A nice domain independent relational calculus formula such that q of i is phi on the active domain of i. Remember the active domain is the set of all values that are carrying your database. Third, there is a relational calculus formula this you don't know it may or may not be domain-independent but you evaluate the query on the active domain only. The difference between 2 and 3 is that the bigger domain still gets the same value because the formula is domain-independent. In the third you have an arbitrary formula you don't know if it's domain-independent but you play it safe by restricting your evaluation on the active domain. So let me sketch the proof of this theorem. We have to go 1 implies 2 2 implies 3 implies 1. It is obvious that 2 implies 3. You use the same formula. 1 implies 2 and 3 implies 1. 1 implies 2 we already proved in some sense we have to go back to the previous translation of algebra to calculus and argue in every step that what you get is something which is domain-independent. So you have to do it by induction. 2 implies 3 I argue this trivial so 3 implies 1 has a very important step. The key to this is to go back and realize that the active domain of i is expressible in relational algebra. That's the first bullet. So that for every relational database schema there is a relational algebra expression such that for every database the active domain is the result of evaluating the database evaluating the expression of the database. What was the active domain of the database? The set of all values. So let's say that we had a relation r with 3 attributes r, a, b, c. What would be the active domain? What's the expression for the active domain? Well it's never to have a projection. So we can take it as π, a, r π, b, r π, b, c π, c, r Right? This gives us all values that are in the database. Very simple, but very important A now we use the above facts and the induction on the construction to obtain a translation of calculus under the active domain interpretation. And that's now straight forward the only interesting part is universal quantification because remember, algebra Είναι η ιστορία του εδικούς παντεβισμού. Είναι η ιστορία του εδικούς παντεβισμού και την ευκογένωση, δεξά. Λοιπόν, από φυσικά χρησιμοποιήθηκε η αντιμετωπικότητα που για όλα γιατί ψ δεν υπάρχει ψ, Λοιπόν, στην διεθνή εξή, δούμε, αλλά, αυτή η φορμιλα που είχα δει, είναι ψ δεν είναι αντιμετωπικό. Στα πράγματα που κάνεις στον τέτοιο, πλέον, αυτό δεν υπάρχει ψ, Η δημοσυκλή δημοσύνη εξοπλή είναι η πλήρυση πι1ρ χαρδι. Είναι πι2ρ εξοπλή, διότι είναι η πλήρυση πι1ρ. Αυτή η δημοσυκλή είναι την σύνορα. Με τη δημοσυκλή δημοσυκλή η νερά ρ είναι η διαφορετική καρδιστική φορταγία του δημοσυκλή δημοσυκλή με τον εξοπλή. Υπάρχει reason why not r is the projection εξαναδείξει την πρώτη κόρδυση. Δεν υπάρχει ο ίνταρς, είναι, η δημιουργία μεταξύ της δημιουργίας, να παίρνει την προηγιουργία. Είναι πολύ σκέφτη, και η κοίτα αυτή, είναι ότι έχουμε σκέφτη την αντιμετωπιστήση της δημιουργίας, και η δημιουργία που η δημιουργία εξαναδείται σε σχέσης της δημιουργίας. Η δημιουργία της δημιουργίας είναι, δεξάμε, δεξάμε, ότι έχουμε αυτή η σκέφτη, που δημιουργεί τη στιγμή, οι δημιουργίες της δημιουργίας έχουν την ίνταρς εξαναδείξη. Δεν έχουν, είναι ότι, under the active domain semantics, έχουν την ίνταρς εξαναδείξη. Εντάξει, ερία challenging questions about this, when going fast because this is really basic material comes out very clean at the end. However there are some interesting questions. First of all, have some observation and the equivalence is effective, so we can go from algebra to calculus and from calculus to algebra and therefore, later on whatever is singer within everyone, they said that with results for calculus it translates to algebra and vice versa. Θα πούμε για να πούμε για έναν σημείο για το ανοιχόμενο, δηλαδή. Βρισκητικά είμαστε να έχουμε δηλαδή το ανοιχόμενο χρόνο. Θα είμαστε να έχουμε πολύνια σύμφωνα, αλλά σε θεία και λίγο, είναι η σημαντική εχή. Για στις ευρυφές στο χρόνο που έχει... γι' αυτό το σημαντικόμενο χρόνο που έχουμε... Κι σε όλες τις λογικά που εταίνουν πρόσφυγες, θα μπορούμε να εξηγήσουμε την πρόσφυγή αν και την εξηγήσωση της εξαιρετικής δημότητας είναι η δημότητα. Γιατί, σωστάly, θέλουμε να δούμε στον προγραμμό για να δημιουργήσουμε κρήματα στα πρώτα του ολογικού, δείξω. Όχι την εξηγήσουμε, αλλά δεν θέλουμε να πρέπει να εγώ να ξαναδίσουμε να εξηγήσουμε την δημότητα ή δεν, να εξηγήσουμε τον δημότητα. Είναι ένας τελευταία που πριν έδωσε το τετκόδι. Ρομπέρτο Διπαόλα ήταν ένα λογισμό, βλέπεις ένα λευμα στον τρόπο, που διδέχε αυτή τη θεόρυμή, έχει δύο εξοπλότητες διδέχευσης, με τη θεόρυμή του πρακτιμπλού, που δούμε στον μέρος, ότι η δημιουργία της δημιουργίας είναι ένα πρόβλημα αντιστασμένο. Δεν υπάρχει αλλαγή, να πω, αν ένα δικαίωση σανραφίδι της ασφαλής λευματίας είναι σανραφίδιο. Δεν είναι αμυσιακό, after what we have seen in the morning, that all non-trivial semantic properties concerning first order logic in some sense, they are not to be undecided. However, there is a next best thing you can do in a situation like this. And the next best thing, is something that also anews with the like to have said, is a effective syntax. Και η σύνταξη σύνταξη σημαίνει ότι αυτό είναι η καλή νησία, μπορείς να δοκιμάσεις μια καλή σύνταξη δημιουργία, ένας κοντεξ-φρυγράμμα, για ένα υπέροχο της first-order λογικής, που έχει την επόμενη ευκαιρία, κάθε φορμιλή σε αυτή η σύνταξη είναι δημιουργημένη. Και η Βασβέρσα, κάθε δημιουργημένη first-order φορμιλή, είναι λογικής ανάγκης στην ευκαιρία σου, δείτε. Είναι η επόμενη ευκαιρία στην ευκαιρία της Βασβέρσα της Ρομπέρτας Δημιουργίας. Μπορείς να δοκιμάσεις ένα ευκαιρικό σύνταξη, ότι σε κάποιο στιγμή υπάρχει όλα τα δημιουργία της ευκαιρίας. Φυσικά, ό,τι μπορείς να δοκιμάσεις, είναι η δημιουργία της ευκαιρίας που είναι λογικής ανάγκης, δείτε. Εντάξει ό,τι το γνωρίζει. Υπάρχει πολλά δημιουργία σε αυτήν την ευκαιρία, και υπάρχει πρόσφυγή για να κάνουν την ευκαιρία πιο μεγάλη, πιο μεγάλη, πιο μεγάλη. Στην τέτοιχη, υπάρχει πολλά δημιουργία σε αυτή τη δημιουργία, γιατί η ιδια είναι ότι θέλεις να δοκιμάσεις το ευκαιρικό σύνταξη και την πιο μεγάλη ευκαιρία της ευκαιρίας της ευκαιρίας, δείτε. Και κάτι like the top of the line here is this paper by Rodney Topor and Aaron van Gelder. The original paper was in Pods 87. This is the 1992 journal version where they describe such a detailed syntax. Anyway. So that's an aside. Now, this was the first part. I wanted to, as I said, to talk about what TED Co did. Now I want to look at three basic problems about database query languages. We have seen what a query is. So a query is basically a function that takes as input the database and gives you back a carry relation and it's invariant on the rhizomorphisms. And of course all the queries defined in logical languages are queries in this sense. And for us a Boolean query is going to be a function defined on some database instance that takes value 0, 1 and is also invariant on the rhizomorphisms. So a Boolean query is given a graph is a diameter at most 3, given a graph is it connected. So we want to look at three basic problems about queries. The query evaluation problem, the query equivalence problem and the query containment problem. The query evaluation is the most basic problem in databases. You give a query in some language and the database and you want to find the value of the query. We saw this problem in the morning as called the model checking problem. That's the model checking problem for whatever language you have in mind. The query equivalence problem in the case it's simply a version of logical equivalence. You are given two queries and you want to know if on every database they give you the same answer. And of course this is very important in an actual database management system because the user writes a query and then the optimizer takes the query and transforms it to a query that is presumably easier to evaluate. In the process you want to be making sure that you work with a sequence of queries that are logically equivalent. The query containment problem is the question given to queries. This is the case that on every database the relation you get by evaluating the query on the database the first query is contained in the relation in the second. If the queries happen to be Boolean queries this is logical implication. So we want to understand what is the algorithmic status of these problems for relational algebra and relational calculus. And already I argued that the query evaluation problem is the main problem in query processing. Equivalence and containment are closely related in the sense that two queries are equivalent if one is contained in the other. That's obvious. And also if our language is closed under conjunction we have that containment is reducible to equivalent. So we have seen already the proof in the morning and I'm grateful for the people that gave the nice introductory lectures in the morning. The query equivalence problem for relational calculus is undecidable. And that's a very easy translation from finite validity in the morning we saw it as finite satisfiability but of course the fact that there is no algorithm to tell if a first order sentence is satisfiable means also there is no algorithm to tell if a first order sentence is true on all finite structures. That's finite validity. You can very easily reduce finite validity to query equivalence by taking something like a sentence that is finitely valid and then asking whether or not your formula is logical equivalent to this finite valid sentence. So finite validity is reducible to query equivalence therefore we have undecidability. We get for free out of this that the query containment is also undecidable because the query containment is reducible to query equivalence. So the query containment is also undecidable and of course notice that here we have a chain of reductions the whole thing problem goes to finite validity that product and brought finite validity to query equivalence query equivalence to query containment. So bad news right So we have for relational calculus and algebra we have one query equivalence and two query containment undecidable. Now you can ask what about query evaluation for calculus and algebra and the two problems are the same for algebra and calculus because of the polynomial time translation between the two and we also saw in the morning that both problems are p-space complete because calculus is first-order logic and I had a different sketch of proof here we saw in the morning in Ram's presentation a very nice proof of the membership in p-space by proving it is in alternating polynomial time right. So let me skip the hardness is the quantified Boolean formulas but you can see also directly but it's in polynomial space because if you have such a expression you bring it in calculus expression you bring it in prenex normal form and then you get these quantifiers let's say you have m quantifiers and you create m blocks in memory and what do you do in every block in memory you keep the presentation of the elements of the active domain in binary so you need logarithmic number of blocks to maintain this and then you cycle through all possible values and you also keep a counter in binary to make sure that you have exhausted all the tuples and you don't keep cycling so this gives you a different way to show that the query evaluation for calculus is in polynomial space but it also the same this argument here tells you what happens when you fix the formula when you fix the formula the number of quantifiers becomes constant so you have now m blocks of memory and now again you keep the values in binary and therefore the whole thing becomes in log space so this is a direct way to see that for fixed formulas the query evaluation problem is in logarithmic space and therefore it's in polynomial time and in some sense this explains in a way the paradox that we have this high complexity we have database systems give answers to queries we are not afraid of a p-space complete argument the reason is typically we have the queries fixed and the database changed so at least we are in log space in that sense but in turn this consideration made Vardy write this influential paper the complexity of relational query languages where he introduced these three notions but they will be very important to us as we go forward the notion of combined complexity data complexity and expression complexity so suppose you have any query language let's call it L the combined complexity is the model checking problem where both the formula and the database are part of it the data complexity is not one problem but it's a family of problems one for every sentence in the language and it's a question given a database instance does it satisfy the formula the query complexity is where you play the game the other way around you fix the database suppose you have a fixed database in which you ask different questions so now the formula is part of the input only and of course you have one such sentence one such problem for every database so data complexity is parametrized by the formulas family of problems expression complexity is parametrized by the database query complexity and you can say what it means for data complexity to be in a language meaning every sentence is the property that the associated decision problem is in the language isn't a complexity class and the query complexity means isn't some complexity class if for every database instance the associated decision problem is in the class Vardy made an empirical discovery it's an empirical discovery it's not something you can prove because you can go over all possible logics that for most query languages the data complexity is of lower complexity than both the combined complexity and the query complexity that's an empirical evidence you have to go query language by query language and quite often the query complexity can be as hard as the combined complexity relational calculus is a case in point here and that's the picture that we have seen today for the combined complexity we saw this piece space complete actually we saw it in the morning data complexity it drops in log space so we see the exponential gap between piece space and log space what about query complexity well, we know it cannot be worse than combined complexity so it is in piece space but actually it can be piece space complete in fact we saw this in the morning in Ram's presentation also because he used a very simple database to encode the unary relation with an element to encode 25 billion formulas so this is the situation with calculus and algebra so in some sense this looks like very bad news for databases because these two problems are undecidable and a query evaluation in at least combined complexity is piece space complete therefore this motivates the following question that we will explain and I will go at a lower pace tomorrow are there interesting sub-languages of calculus for which these two problems are at least decided and how low can we go and by the same token are there problems are there languages for which the query evaluation is lower complexity but at least combined complexity than piece space and how low can we go and as we will turn out and that will be the topic of our discussion tomorrow and we will try also with some of the things that I think Ben Rosman will be talking about there is this language of conjunctive queries which are simply existential positive sentences built from atomic formulas not disjunctions and they have this lower complexity but the important thing about them from a database point of view is that they encapsulate the most frequently asked questions in databases so what we are going to do tomorrow is explore these three problems equivalence, containment and evaluation for conjunctive queries and then we will get some good news and some bad news also and then we will try to go a little bit beyond them so in some sense here while before we went outside of first order logic now we are going inside first order logic and try to see what parts of these problems have more tame behavior than the full algebraic calculus so I will stop here