 So why should you take advanced database systems? So I say this every year, and every year this continues to be true. And that is that database system developers are in high demand in industry, and also in academia. I can't get enough of them either. But the feedback that I continually get from people in industry is they want people that know how to work on complex systems, particular database systems. And so if you take this class, we will teach you how to write high performance and correct code in the context of a modern database management system. And so a lot of the lessons you will learn from this class can then be extrapolated and extended to other areas of computer science and other areas of systems, right? So people that can work on database systems are in high demand because they have to really think about hard problems in complex software. And if you're good at doing database systems, you can then apply that knowledge to other areas. If you're just a JavaScript programmer, they're not going to hire you to work on your database system. They're not going to hire you to work on the OS kernel. But if you can do embedded systems, database systems, operating systems, you can do pretty much anything. So that's the one thing I hope you get out of this is that things we're talking about here are not just only for database systems, we're doing it in the context of database systems, but it will be applicable to a bunch of other things. And just to sort of tell or show you why this is true, I just want to give a roster of previous students that have taken this class or have worked with me, I hit their identity for the request. But these are just a sampling of some of the students that have been working with me on the system that you guys will be working on in this class. And they have all been gobbled up by various database vendors and major software companies to work on database things, right? So again, some of these students are in the room right now, but we can ignore that now. But, and I'll continue to say this throughout the semester, the one thing that every single database company emails when they ask me if they have any students that work on, particularly Eric, is the query optimizer. So this will be a really difficult thing we discuss, but if you get really good query optimization, you will have no problem finding a job for the rest of your life. At least five years I suppose, okay? All right, so what are we gonna discuss today? We're gonna talk about the wait list, because I'm sure that's what a lot of you care about, that we're talking about the course outline, what the course will be about. And then I'll finish off the latter half with sort of my crash course or introduction to the history of database management systems. Please come back to the 1960s, okay? So I say this every semester and I try to get better, but it never happens. I get excited when I talk about databases and I start talking really fast. So if you have a question and you don't understand what I'm saying, just raise your hand, tell me to stop, shut up and slow down and I'll repeat myself. I'll try to keep talking slowly, but again, as I said, I get excited. And the other thing too is I won't answer any questions about the material that I'm discussing in the lecture at the end of the class. So if you have a question and you try to wait to the very end and you try to come up and talk to me at the very end and say what about slide, whatever, I'm not gonna answer it because I'd rather you stop me during the lecture and say this doesn't make sense or what about this? Because if you have a question and somebody else probably has a question and the things we're gonna talk about are actually really difficult, so it's important to make sure you guys understand because otherwise I'm just gonna keep on going. So I didn't do this before and I found the hard way that again, there's a lot of you that are shy or worried about I'm gonna be upset that you ask a question, it's not true. So don't wait to the very end. If you have a question as we go along, just tell me to stop and I'll repeat myself, okay? All right, so the wait list. Currently we have 73 people on the wait list as of this morning. The max capacity of this course is 40. I'm capping at 40 because that's the largest I can go and maintain the quality and the kind of hands-on collaboration that I can have with you during the semester going beyond this, it's simply not gonna be on the scale. So there are currently 28 people enrolled in the course and there's, that means there's 12 free slots. And so what we're gonna do is in order to be fair to everyone and make sure the people that really wanna keep this course can get in this course, we'll be pulling people off the wait list in the order that you complete the first assignment and get 100%. The first assignment has been posted on the website a few hours ago or an hour ago. I won't talk about it in this class but we'll talk about it in next class but if you wanna get started and get working on it now Prashan and I will be helping you out in Piazza when we get started, you can start this today. AutoLab is set up, everyone down the wait list and the enrollment has been registered into the class on AutoLab. We just haven't turned on the submissions for the first assignment and we hope to do that either today or tomorrow. All the code is available on GitHub now, you can get started on the first assignment immediately. Okay, any questions about this? All right, so what is this course about? Obviously the title is Advanced Databases but that can go a bunch of different ways in particular, I'm concerned about the database internals. So this is a course about how to build a modern database management system. So we're gonna understand what are the main concepts and what are people doing in state-of-the-art systems today but it is a larger theme, we're also gonna learn about how to write high quality code in a complex software system, right? And in context of databases, we don't care, we not only just care that the code is fast, we also care that it's actually correct. So we're gonna learn how to handle that. You guys will learn how to write proper documentation, write good testing or good test cases. For all the third project, you do code reviews and we'll learn how to do that. And then just in general, since the system you'll be working on is a really large code base and a lot of people that wrote some of the pieces you'll be maybe modifying aren't here anymore, right? Which is totally how things are in the real world. Hopefully you'll get some insight and some experience on doing this, right? So I'll say this before throughout the semester, but I always, when I first started teaching this course two years ago, I asked all my friends in database companies, but what's the one thing you want me to teach students? Like, you guys want me to really cover locking, you want to cover indexes, like what's the one thing that you want a student that takes my class, come out and have a certain skill in? And without me prompting either of them or telling them what one company said versus another, they all basically said the same thing. They said they all wanted students that knew how to work on a large code base independently, right, and get started running your day. So there's not something, you know, you can teach somebody to do this. It's something you learn over time and get better at. All right, so the scope of the course will be that we're gonna be covering single node in memory database systems. So this is not a course on distributed databases. The reason is because one, our database is only single node not distributed, so there's that. But I really want to think you can, there's a lot of important problems you have to solve first on a single node system before you want to go distributed. Just because you're running on multiple machines, if you have a crappy single node system and you make it distributed, then you have a distributed crappy system, right? So we want to make sure that the single node thing is high performance and correct first before we scale it out. I think there's a lot of interesting topics in this area and that's what the course will be focused on. I'll say also too that everything we're talking about here for the most part will not be found in a textbook. This is a graduate level, PhD level course. So we're talking about state of the art topics, right? So if you took the intro class last semester, I would consider that as sort of a classical database system course, right? You teach about buffer pools, you teach about the architecture from the 1970s, 1980s. The things we're talking about here, although it builds upon those topics are, things are gonna do if you're gonna go to brand new data system today. What would you have? So the topics that we're gonna cover are concurrency control, indexing, both latch-free and with locking and latching, storage models, compression, system catalogs, parallel join algorithms. A new topic for this semester will be networking protocols. How do you actually have the data system talk to other database nodes or how do you ever talk to the client? Logging recovery, query optimization, compilation and execution and then running database systems on new hardware. So I will say that the focus of this semester versus previous semesters will be heavily skewed toward this topic here, in particular code compilation and query generation or code generation, right? Because now the new database system or our database system that we've been voting here at CMU that would do all your projects on is now based on the LLVM. So that means that instead of interpreting the query plan like we did in the classic database course, we're actually gonna generate code on the fly for the query plan, compile that into machine code and execute that. And we do this for performance reasons. So the part of the reason we decide to do this is because the feedback that I've been getting about the previous course has been, it's been way too easy, right? So on the 4chan message board, somebody posted a question, hey, do you know any source where I can learn relational database theory really easily, thanks in advance? And then I don't know who this person is, but he decides to chime in and look, yeah, CMU has a course. The beginner course only has a few slides, but the advanced course, I mean, in this class, but it's not that advanced, is the place to go if you're wanting full results on YouTube. So that hurt. And then a few days ago, somebody on Hacker News posted the last year's course on Hacker News, we made the front page and someone's like, yeah, thanks for doing this. This would be really useful to train junior programmers. And this guy's like, you really wanna start juniors with this. And he said, yeah, sure. This doesn't require any degree, doesn't require any work experience. So this should be easy, right? So we decided to have you guys work more on the LLVM stuff part of the system, because that's more complex. And this is definitely where the industry's going and applying this query compilation technique. So don't worry if you haven't taken a part of this class, I haven't either, I don't care so much, right? It's really about how can we use the LLVM and query compilation, the code generation in the context of a database system, right? And if I can understand it, then you guys can understand it. Okay, so what kind of background are you gonna need? As I said, already said, you don't need to have taken a compiler's course. We will cover all the things you need to know about LLVM in order to do the projects. But I'm gonna assume that everyone here has taken some kind of basic intro database class. Either 445 from last semester, or if you're a graduate student, you've taken a database course at your undergrad institution, right? So the reason why you need to have this background is because we're gonna discuss the modern implementations, the modern variations of classical database implementation algorithms, right? So I'm not gonna teach you what a hash join is or what a grace hash join is. I'm gonna teach you the radix hash join, the more modern parallel version of it, right? So you sort of need to understand the fundamentals and understand the more complex topics. So with that said, the things I'm not gonna cover are SQL, serializable theory, so everyone should know what complex serializable is versus view serializable. I'm not gonna teach relational algebra, the basic algorithms, as I said, the join algorithms, sorting algorithms, things like that, and then the basic dash structures, right? Everyone should know what a B-flustery is and should be able to implement one, okay? All right, so the background from a technical side, from a programming standpoint, all the projects you would do in this class will be done in C++11. I'm not gonna teach you C++11, right? If you ever should know C or C++, and everyone should know how to debug a multi-threaded C++ program, right? You should know how to plop it down to GDB or whatever the one is on the Mac and step through the frames and try to see what's going on. So the first two projects in this class can be completed entirely using GDB as your debugger. For the third project, since you guys choose what you wanna do, if you start mucking around in the LLVM engine code, then GDB is not gonna be your friend because what'll happen is if you have a segfault, you don't land with a nice stack trace in GDB, you land in assembly. And you have to re-assembly to figure out what's going on. Now, there's enough components in the system that aren't being in the LLVM stuff where you don't have to worry about this. Marcel's here and he's been working on a tool that allow us to step through the LLVM IR to make it easier to understand what's going on. But I just wanna emphasize that this is a programming heavy course, it's a systems oriented course. So you should be prepared to deal with bugs in your C code, okay? All right, so let's talk about logistics now. All the course policies and the schedule are being posted on the course website. And I hate to have to say this every single time, but please read the CMU academic policy. If you're doing something that you don't wanna, maybe not sure what is the right thing to do, please contact me, because I'd rather have you ask me, say, hey, we found this code on the internet, can we use it? I'd rather have you ask me to say whether that's the right thing to do or not. Then me find out that you cribbed last year's assignments or stole clothes from someplace that we shouldn't have taken code from, and then I have to go to Warner Hall and report you, okay? So I'm very serious, please don't plagiarize because it makes my life harder and I will destroy you, okay? All right, now office hours. I'll have office hours on Mondays and Wednesdays in my office right before class. I'm up on the nine floor. The things you can talk to me when you come to office hours, because I bring this up, because every year kids come and talk to me about whatever they want, because they think I'm young or they think I'm street or whatever. So we can talk about problems in your projects. I have a pretty good understanding of how our system works. There's some corners that, I mean, they don't fully understand, but I can at least sit down with you in GDB and try to figure out what's going on. If you have questions about the Indian papers and you guys assigned to read in the class, we can discuss those. And because I've had students come to me when their girlfriends break up with them or boyfriends break up with them, I can give you tips on how to set up your account Tinder or Bumble or Grindr, whatever you want and help you be successful in that aspect of your life. All right, so we have one TA for this course. His name is Prashant. He's not here right now. He'll be back on campus later in the week. He's a third year PhD student in the CS department of me. I could advise him with Todd Mallory. He did his bachelor's and master's at the University of Toronto and then spent some time at IBM before he showed up here at CMU. So he is the lead architect. I am the sort of main developer now on the Peloton database management system. And he was the main designer for the LLM engine that you guys be working on in the first project. So he can pretty much ask, he can answer any questions that you may have. And he's ridiculous. He's amazing. I stand by my last statement there. And again, he and I will be answering questions on Piazza and he'll have office hours that I will post as well. All right, so what's expected you for the course? So there'll be reading assignments, programming projects, midterm, finals, AM, and then extra credit. So we'll go through each of these. So for every class, if you look in the schedule, we'll have a bunch of papers. And the paper at the top will have a little star next to it and that's the required reading. So before class, you're required to read that paper and then submit a short synopsis or summary to me on a Google form, right? And the summary can basically be one paragraph where we just basically say, here's what the paper was about. Here's the system that they used or modified to evaluate their claims in the paper. But here's the main contributions of that. And then what was the workload that the end of using, right? And this last one is important because we have a benchmark framework that you can use in Peloton for your third project. And you understand what are the different workloads you wanna use to evaluate your project at the end, right? If you're doing something for analytical queries and you don't wanna run TPCC, you wanna run TPCH. So you have to get a feel for what people are actually doing in these research papers to evaluate their ideas in their database systems based on the last one here. And so this link here is posted on the website. This will take you to a Google form and you just select the date, put in your Android ID and copy paste your summary in. Again, as I said before, please don't plagiarize. We do check, right? It seems you have a nice tool that makes this all happen very easily. So just don't copy the text from the paper and plop it in. Don't find another summary that somebody may have written in another class, in another university and copy their summary in, right? Again, if you have questions of whether you're doing the right thing, please ask me, okay? Does that rather just, you know, you and I hash it out rather than have to go before you. Now as I alluded, all the programming projects in this course, we based on a new open source database management system we've been building here at CMU for the last couple of years called Peloton. Peloton is an in-memory hybrid database management system. So in-memory means that the primary source location of the database is entirely in memory, right? Contrast this with last semester, if you took 445 or 645, where that was a disk-based database system where the primary source location of the database was on disk and you copied things into memory as needed. With this system, the database starts off in memory, right? And if you need to, you don't support this yet, you could flush things out the disk that they don't need anymore, right? So as I said, this is a modern code base. It's based on C++11. It's multi-threaded, we use the LLVM to query compilation and code generation, and we speak the process by protocol. So you can open up the process terminal, this is normally on your laptop, and you can connect to the database system. So again, this is a research project that we're doing here at CMU, but we actually try to spend time to do high quality engineering to the best that we can, because we want this thing to end up being usable outside of CMU. So we actually just hired a full-time staff member and engineer last month, who's now gonna be helping us make the code cleaner and more solid and actually usable outside of us. So for your project, you can do all your development on your local machine, Pelton builds on Linux, and then as of last year, it now builds on LSX. There's instructions on the website how to do this. You gotta make sure you have the right version of Xcode. One student last year got his sort of compile on Windows 10 with a new Linux package or component you can install. I don't recommend that, we haven't actually tried to do it yet. But if you don't have one of these platforms, we'll provide you with a Begrant VM. Begrant is basically a wrap-around virtual box. So we give you a little Begrant file, and there's like one command that'll automatically download and install and set up a virtual machine for you. It'll have all the dependencies you need to actually develop Pelton on your local machine. When it comes to time to do benchmarking in project two and project three, we have a cluster of machines that were donated to us by MemSQL, which is an in-memory database company, and we'll provide instructions on how to log into these guys and run them. These will be, I think, two socket, dual socket machines with like 120 gigs of RAM. So these will have much more memory than your laptop. And if you wanna run performance profiling and other more complex experiments, you can use these. So we have some additional materials also provided us later in the semester, and that's a contribution from our friends at Snowflake Computing, Snowflake TV. All right, so the first two projects we're gonna do is provide you with the test cases and the scripts and everything you need to evaluate your first programming assignments. You'll submit these on AutoLab. It'll do the auto grading and then spit back whether you succeeded or not. For project one, this will be done individually. The reason is because we want everyone to get their hands dirty working on the database system and understand how to build and how to run it and how to test it. But for the second project, it'll be groups of three. So you start when you start to think about now who's in the course and who do you wanna buddy up with? Because the second project will be handed out I think in two weeks. And then for the third project, that'll also be a group project and usually what happens is the people you do project two with, you end up doing project three with. Last year we had somebody who had a fight with somebody else because they didn't shower enough. So I had to break them up. That happens, we can figure it out. But in general, you would pick the right people for project two because that's the people you wanna work on project three. Okay, so project one, as I said, it's out now. Again, if you're on the wait list, this is how you get it rolled by finishing this with 100%. The basic idea is that you need to implement three string functions, SQL functions in the database system, right? Upper, lower, and cat. They pretty much sound exactly what they sound like, right? And again, this is not meant to be really hard. Like it's not hard to write the upper function. It's really how do you connect the pieces from the catalog and the binder to the code generation portion to the actual SQL plus implementation of the string functions. So the website provides details on how to do this. I'll talk about more about it on Monday next week in class. But then also there'll be a special one time recitation on Tuesday at 5 p.m. on the ninth floor. We'll go into more details about how to actually do project one. All right, so you recognize that this is a big piece of software we're throwing at you guys for the first person assignment. So Prashant and I will show you exactly how it all fits together and all works. So I'll send a reminder about this on Piazza. And again, please don't plagiarize because it makes my life harder and it makes your life terrible, okay? All right, for project three, as I said, this will be a open-ended group assignment where you're basically gonna pick a group of three people and you're gonna propose to build some new component in the database system based on the things that we discussed in the class. So this will be a major programming assignment. I think it's 40% of your grade. And you will be working on this for at least half the semester. And so instead of just getting writing much of code and then throwing it at us when you're done or just leaving up on GitHub and never actually doing anything with it, the goal is that your third project, you wanna get this merged into the core database management system. So we're about 50% success rate of getting students projects merged into our main system. So you don't worry about this just yet. The class right before the spring break, I'll spend time discussing different projects, topics you could pursue. I'm also available to talk about the kind of things we're interested in. And I think I sent an email out about doing research on this area, right? So we can pick a project that might be turning something into a paper later on if you're interested, okay? All right, so with the end of our project three, you have to do a proposal that we done in class. You'll do another presentation where you say this is our current status of the code. You'll do two code reviews during the semester where you'll be paired up with another group and you'll review their code on GitHub and provide feedback to them and then also to monitor this and make sure you're doing things correctly. And then you have a final presentation and then I don't give you a final grade. I won't give you a final grade unless your code can cleanly merge into the master branch on GitHub. And depending whether we merge it or not it depends on how well your code is. For the project proposal, again, this will be five. I'll cover this when we get closer to it. This will be five minute presentation and you just describe what you're actually gonna have to do to actually modify the Peloton system to actually do your implementation. Then the status update, basically the same thing, what's going on, how are you making out? And is there anything that changed in your plans? But then for the code reviews, again, you'll be paired up with people that are in the class and you'll review their code, they'll review your code and you provide feedback for them. And this is important because when you go out in the real world, it's not like you just write code and it is emergent and willy-nilly. You are gonna have to make sure that your code is up to a certain standard and quality and other people are gonna check this for you. So you wanna learn how to do this, okay? Again, for the final presentation, be 10 minutes on the last day of class. So we'll have a scheduled final exam. I don't know the date yet, but give it to us and later in the semester. The final exam will be actually be on the last day of class. The final exam that the day during the finals week, that's when we'll have pizza and a presentation demo session to talk about everything. Again, I won't give you a final grade unless everything can clean the emergent master branch and everything's commented and everything's cleaned up nicely. All right, so any questions about project three? We'll cover this more as we go further along in the semester. All right, so I previously only did a final exam, one exam during the entire semester over the last two years. The feedback I got every year was be really nice to have a midterm exam. That way they have a sort of checkpoint to understand what these exams look like and not wait until the very end. So on the last day of class before spring break, it'll be Wednesday, March 5th, we'll have an end-class midterm exam that covers all the topics we've covered up, we've discussed up to that point. And this will be like a long-form question like in multiple choice. I'll ask you high-level questions about the core ideas that we talked about during the semester, not like, did this paper say this, right? We're not going to do stupid things like that. It's more like the conceptual questions I care about, right? And then for the final exam, instead of having that in class, that'll be a take a home written exam. And this'll be all long-form questions, not multiple choice. And then you'll have to turn it in when we come to the final presentation day during finals week, okay? So another thing that I'm offering this year also is extra credit. So we have an online encyclopedia, I've been working on this for three years now. We found the outsource of some guy in Brazil. So it's actually written and it works. We just have to put it up. So we have a website that's essentially trying to be the Wikipedia database of database management systems. So we're calling it the database of databases. And so you can get extra credit, 10% in the class if you write an article about a particular database system of your choice. So I'll provide more details about this later in the semester, but just pick whatever database system you're interested in learning about and then you just end up writing sort of a quasi-academic style article about it. And so what I'm doing differently than what Wikipedia does, like Wikipedia has a free-form text field, you can write any text you want. My system is actually semi-structured. So like for concurrency control, you select whether it has two-phase locking or OCC and things like that. Then you write a paragraph about what it's about. So it's not free-from-tax, it's more structured. And again, please don't play your eyes, right? Yeah, yeah, yeah. Okay. So the breakdown for the grade, the 10% for reading reviews, the first project is only 5%, again, it's not meant to really, really stress you out. It's just meant to have you get to understand what the code looks like and how it works. Project two will be 25%, project three will be 40%, and then the midterm will be 10% and 50% respectively. Okay? So all the discussion for the class, the project, the assignments, everything will be done on Piazza. If you have a technical question about the project, like even if it's like my code doesn't compile, I don't know why, please post it on Piazza and that way everyone can see it. I don't email Prashant and I directly because we don't want to answer all these individual emails. Hopefully someone can find, if you have the same problem as somebody else, you may be able to find the solution on Piazza rather than waiting for us to respond, okay? Now if you have any non-project questions, like you're sick and you can't make the midterm or something like that, then just email me directly and I'll take care of it. Okay? So any questions about the expectations of the course? Not, all of you are in love with it. Okay, great, awesome. So let's talk about history databases. So this talk is a combination of two papers. They're both on the website, obviously this is the first class that they're not assigned. The first paper is from Mike Stonebreaker. He's like a godfather, one of the godfathers of databases, one of my advisors when I was in grad school. So he wrote called What Goes Around Comes Around. It's a little old now, it's from 2006 and it was an introduction to this thing called the Red Book which is a compilation of famous database papers. And his paper basically covers the history of databases up until 2006 and he covers all the major trends that we'll talk about here. And then the second paper is a paper that I wrote with a industry analyst in London named Matt Aslet and it's called What's Really New with New Sequel and this covers databases up until 2016. So it sort of picks up where Mike left off and then it covers up until 2016. Okay? And there's actually some stuff that has changed the last two years since then, some major trends that I'll discuss at the end. Okay? So the main thing that Mike says in his paper, The What Goes Around Comes Around is that a lot of the database issues that people were dealing with from way back in the day, like in the 1960s, 1970s, when people were building the first data systems, a lot of these issues are still relevant today. And what has changed mostly is that the scale of the problems are much different. Are they much larger? There's more computers connected to the internet, there's online applications, right? There's way more people, you have to deal with, way more data you have to deal with. And from a system standpoint, we still had this major trade-off between disk and RAM and CPU scheduling and CPU speeds and things like that. So a lot of the things that people dealt with in the 1970s, 1980s, when they were building the first data systems, we still have to deal with today, it's just the landscape has changed for the core concept of the fundamental problem is still the same. And in particular, there was a trend for a while, this debate where it was like SQL versus NoSQL, Google came out with these NoSQL systems in the 2000s, like Bigtable, and they said, well, you don't need SQL, you don't need transactions, you don't need joins, and you don't need SQL at all. And then everyone sort of jumped on that bandwagon and started building all of these NoSQL systems. This are even about, or this, a comparative contrast between these two types of systems is actually exactly the same debate that they had in the 1970s where they were dealing with the relational model versus the network model or code of sale. Who here knows what code of sale is, has heard code of sale a little more? Nobody, it's expected, right? Who here knows what cobalt is? Who are you? Okay, so we'll cover this, right? So back then, people were saying the relational model was a bad idea in the same way that the NoSQL guys are saying, right? And they say, no, no, you don't use relational model, you want to use the code of sale model, the network model. And then the relational guys were saying, no, no, the SQL is what you want to use, you want to declare it in language, you want to have your database have this nice relational abstraction. And the fact that none of you guys have ever heard of code of sale tells you who won that debate. Now when you look at the SQL versus NoSQL, you know, some of the NoSQL guys, they had some good ideas, but with the exception of only a few of them, then those systems either went under or they've added SQL support or make things that look like a relational database system. So again, we're seeing this kind of same thing in the history where people repeating themselves, right? The same debate we had back here, but then we had maybe 10 years ago and SQL and relational model won again. So let's go back to the very beginning. So the first database management system, at least reported to be one of the first, was a system called IDS out of General Electric. So this was developed internally at GE to help with their sort of manufacturing processes. And then they ended up spinning it off or selling it as software to other companies outside of GE. So one of the first companies was a major lumber company somewhere in Seattle and they ended up buying a very expensive machine from GE. They bought the software and they sort of deployed one of the first major database systems in that time. So GE had this weird thing or philosophy where if they couldn't be the number one or number two in a particular segment of the market, they didn't want to be in that market at all. So when they had a computing division, they realized that they were not even in the top 10 compared to IBM and all the other ones that were out there. So they decided to sell off their computing division to a company called Honeywell. And Honeywell went forth and sort of was promoting this IDS system. So the two key things about IDS is that it follows a network data model, which I'll explain what that is in a second. And then it supports what are called two-point-a-time queries. So if you know SQL, right? SQL's based on a bag algebra where you can write a single query and then can process multiple ones at the same time or multiple tuples at the same time. In IDS, you essentially write these four loops that would iterate over one tuple at a time and do whatever operation you needed to do on it. And then once you were done that operation, you would then go look back around the loop and then operate on the next tuple. So the main component of IDS, and then the main guys that was designing it was this guy Charles Bachman. He won the Torney Award for this early working databases in like 1973. And so again, he was the main guy pushing that we should use code as well. We should use the network data model, the relational model's a bad idea. And so he helped build IDS and then he left Honeywell in the early 1970s and went to a company called Kulane Databases and he helped build another system that was based on the network model called IDMS. So IDS, I don't know if it's still around, but IDMS, you still can buy it. If you're a new startup, you wouldn't actually want to use this system. You want to use something more modern, but there's enough legacy systems out there that are probably still using this. So what does the network data model look like? So let's say that I have a database where I want to keep track of the supply database of what parts different vendors are supplying. And so the way you would model this is that you have these sets and these sets are connected together with these membership sets. So if you have a supplier, the company that's gonna sell a particular product, they would have a company name, what city or state they're located in and then it would be connected together with the supply set through this supply's cross-reference set. And so you'd have to write these four loops that would iterate over every entry in here, then do a look up in here, and then every entry in this, then you do a look up in here. And then say, you know what I'm saying for, for every particular supplier, what is the part that actually is supplying? Your for loop has to keep going down with another nested for loop to say, what, what, this thing's being supplied by what part? Right? And so you'd have to write these programs that would have to mainly navigate these multi-dimensional networks, right? There was no SQL, there was no sort of declarative language to actually do this. You have to write this was done in COBOL, which is an early programming language. You have to write literally the for loops to iterate over these data structures and do it yourself, right? And so when you started having really complex queries, this became really difficult to do. So complex queries are a big problem of this. And then the other problem that we had at least in the early days was that this database was easily corrupted. So if you lost, say, one of these membership sets, then you were disconnected from the supplier and supplies. You had no way to combine these back together, because the reference information says that this supplier supplies this supply entry stored here. So if you lost this record or lost this table or collection, then the whole database got corrupted. Right? So another major system in the 1960s was this thing called IMS, or IBM's Information Systems. So this was the system that IBM helped develop on the behalf of NASA, it was called NASA, but it was part of the Apollo moon mission, because they were building this really expensive, really large rocket to take people up to the moon, and they needed a way to keep track of all the different parts they had to buy from the different vendors to put their thing together. So IBM helped build this database. And so IMS is still around today. You still can buy it. I've heard rumors that it's still the number one best seller, or it's the number one money maker for IBM, because there's a lot of companies that set the system up in the 1970s on their legacy applications, and they don't want to switch off it. Right? So IMS, it's got a hierarchical data model, and I'll explain what that is in the next slide. It has a programmer-defined physical storage format. So what that means is that, say you have a table, and you have to define when you load that table, you have to tell the data system what data structure to use. Should you use a hash table, or should you use a B plus tree, or index, or a tree-based index? And then what happens is, based on what data structure you choose, the database system then exposes to you different, a different programming interface. So if you say, I'm gonna load my table and use a tree, then you can do range queries on it. But if you load your table as a hash table, and you realize, oh, I need you to do range queries, you can't tell the data system to go ahead and modify it. You have to dump the table out, then load it back in as a tree. And the same thing we had with the network data model, in the hierarchical data model, it's a two-quad of time interface. You write these nest-before loops to iterate over one collection to the next one, to one of the next, and find all the entries you're looking for. So let's look at an example here. So let's say I have, my schema is a supplier and a part. When I actually now store this inside of IMS, I would have one table instance for the supplier entries, but then I would have to have separate tables for every single part that was being supplied by a supplier, as each of these would be stored as a separate data structure. And so the issue here is that you're gonna end up duplicating data, right? So I have two suppliers that supply batteries, and now I'm duplicating the part name batteries here, right? So there's a lot of duplicate data in this model. And then there's no data independence. So remember I said, you have to define what the data structure is ahead of time, and if you realize you need to use it a different way, then you have to go dump the data out and put it back in, right? Because there's no, the data system isn't hiding from you how the data is actually being stored, right? So the thing of this, when you program in Python, right, you can make a dict, you can make a list, and you know what the data structure actually is, because you program to it. It's essentially the same idea here. So now in the 1970s, there was this guy named Ted Codd, and he was at IBM Research in New York. And he was basically a walker on IBM, and he wasn't a programmer, he was a mathematician. He's walking on IBM, and he saw all these IBM programmers spending their time rewriting their IMS and Codesale programs over and over again. Anytime the schema of your database changed, you had to go back and change your application, right? Like you had to say, if I add a new column, I had to modify the code so that they can deal with the fact that there's now a new column in any tube that I access, right? So he basically realized, wow, this is a huge waste of man hours or a huge waste of time to make these changes over and over again. A better idea is that we had an abstraction that said that we just told the database of what our data looks like at a logical level, and then we didn't care about how it was actually being stored. And then we can write programs against that logical view of our database, and if the physical layer changes, we don't have to change our programs. So that is the high level idea of the relational model. And so in his seminal paper, he has sort of three key ideas, and they're all sort of related to this abstraction idea, that your database is gonna store all the collections of data and simple data structures. He called them relations in sort of vernacular parlance. Now we say tables, the same thing. And then the physical storage of these actual tables or relations will be left up to the database system. It can decide what the right layout is and what physical device to store things on. And then another major difference between the other two models is that you're gonna access these tables or these relations through a high level language, which now we know is SQL. At the time there wasn't SQL. And again, he's a mathematician, so he proposed a sort of relational algebra or relational calculus to do this. And there's only until later that people actually try to make a programming language like SQL or Quel based on this. So again, this should be all familiar with anybody that took the intro class, right? I have a supplier and I have a supply and then these are joined together through a foreign keys and that allows me to get up into the part if I need to. And I'll direct these four loops and I'll traverse the data structure manually. I'm gonna write SQL and let the data system figure out how to do that. So, Ted Cod writes this seminal paper in 1970 that lays out exactly how to do what the relational model is for a database system. But as I said, he was a mathematician, not a programmer, so it's not like he's then sat down and tried to actually implement it. So when this paper came out, a bunch of people saw it and said we should try to do the same thing, right? So in the early 1970s, there were sort of three main groups trying to build the first relational database system in that system. The first two were System R and IBM Research in San Jose and then at Berkeley, my stone breaker led a team to build a system called Ingress. A little bit later on, this guy, Larry Ellison, came along and he sort of waited what was happening at IBM and Berkeley thought something was kind of cool here. So then he went all the way to building his own relational database system as well. And he essentially was copying what the IBM guys were doing, right? If you read the novels or the oral histories of the early System R developers and IBM, they talk about how Larry Ellison would call them on the phone and be like, hey, look, if you give your data system this sequence, what's the error code, right? And he would endowed program the exact same thing. And of course, he's now whatever the, he used to be the fifth richest man. I think he went down last month, he might be the seventh or eight, whatever. So Jim Gray, somebody we're talking about a lot through the semester. He was an early pioneer in databases. He won the Turing Award in 1996 for this. Stromberger won the Turing Award a few years ago for databases and Larry Ellison, again, he's very rich. Okay, so these early systems proved that you could build a relational database system and have it scale. And they showed that the code of steel model was insufficient. They showed that hierarchical model was not what you wanna do. And this was the right way to go. And the relational model really took off in the 1980s because now outside of just the three systems I've talked about before, there is now a bunch of other what I call enterprise relational database system that came on the market. So IBM never actually commercialized System R. They ended up taking some bits of it and pieces of it and put it into different versions of DB2. But they came out in 1983 with a brand new data system called DB2. They said, this is our relational database system. This is what we're gonna go on going forward. And so in the 1970s, it wasn't clear that SQL was gonna be the effect of language relational database systems. Like the IBM guys invented SQL, but Stromberger had this language called Quell that he invented at Berkeley. He claims it's better than SQL, I disagree. But, and of course Oracle copied IBM, so they had SQL too. So when IBM came out with DB2 and then DB3 said, here's our relational database system. It supports SQL. Oracle was like, hey, look at this, we're right here. We have another data system that also supports SQL, right? The Ingress guys added SQL a little bit later. Depending who you talk to, you could argue that Oracle won because they had SQL and Ingress loss because they had Quell. I don't know whether that's true or not. But in addition to DB2, Ingress, and Oracle, there's a bunch of other enterprise relational data systems that came out in the market around the same time. So this clearly was the point where it became obviously everyone that the relational model and SQL were the way to go. So most of these systems, actually all of these systems are still available today. Teradata is still a major company. They have one of the first data is warehouse appliances that came out in the 1980s. Interbase was developed by Deq. It's one of the first early MVCC systems. That has now been open sourced as Firebird. You still can get a commercial version of Interbase. I don't know how much of the code is still the same, but there's some company that bought whatever Interbase was from Portland in the 1990s and now they market it as an embedded database. But my understanding is the legacy was from the original system from the 1980s. Sybase got bought by SAP, but Formers got bought by IBM. So all of these up here are the only ones that are still considered to be state of the art are still being actively developed. It will be DV2, Oracle, and Teradata. HP bought, or Teradata got bought by Deq, got bought by Compact, got bought by HP. And even though this is on video, I'll say it. HP is the place where you go if you want your database company to die. So nonstop SQL is a really cool system. HP basically floundered it. Vertica was another cool system. HP basically let it die there. Anyway, so another big thing though is that after Stonebreaker left Ingress in the early 1980s, like 1984, then he went back to Berkeley and started a new database system called Postgres. You ever want to know what Postgres stands for? It's Post-Ingress, right? And so he started that project and that's still the core system that we use today is based on the code that developed in Berkeley in the 1980s. Actually, the first version of Postgres was actually written in Lisp. They got rid of that, reordered in C, and of course, Stonebreaker really loved Quel, so the first version of Postgres in the 1980s supported Quel, not SQL. And then in the 1990s, it's when they added SQL. There was a movement in the 1980s towards what are called object-oriented databases. And the basic argument here was that the relational model was a bad idea because it had this impedance mismatch between how people wrote code, right? You write object-oriented code in SQL plus or Java or Python, and when you actually try to go store those objects in the database, you have to break them up and store them across different tuples and different relations. And so this movement came out in the 1980s and said basically, well, you don't actually wanna store things in relations. If you're writing your code in objects, you should just store things as objects, right? So there was a couple of these systems that came out and the basic idea was that they added additional hooks or different libraries for their programming languages to allow you to write out objects that are written in memory in the programming language directly out to a database system, right? Nowadays, we use ORMs and they're essentially using the same thing, but back then you would have a sort of special proprietary language. So a bunch of these companies are still around. They've been bought by sort of holding companies so they're still there. They might still make money for legacy applications, but again, no one's actually actually developing new applications based on them. So for Song, Object Store are for two major ones. Mark Logic is something that came out in the 2000s. It's essentially an XML database. I'm only including it here because a lot of the ideas that came out of this movement, the object-oriented databases, end up making it to now into relational databases. So pretty much every major relational database system now supports JSON or XML. But you can do this in the context of SQL. So what does this look like? So say you have an application code and you have a class file and we have a student. Student has an ID, name, email, and then a list of phone numbers. So if you had a relational database, you'd have to write the schema like this where you have a student table and then you have a student phone table that has a foreign key to the student table because you need to be able to support multiple phone numbers for a particular student, right? So now if you want to go, say, store this data in the database and then you want to instantiate this object in your programming language, in your application code, you essentially have to do a join between these two tables to go suck in all the data you need and then fill these values in, right? And so the object-oriented database guy says, well, again, you're doing a join, you're doing multiple queries to go populate this thing. What you really want to do is just store the data directly as it exists in your programming language. So the example I'm showing here is JSON, right? Just again, to bring up Stomper's point, what goes around comes around. JSON databases are essentially the same thing, right? They're making the same argument that you don't want to store things as normalized relations, you want to store them as denormalized collections or denormalized objects, right? So the problem with this is that when you want to start doing really complex things, this becomes really tricky, especially because these early object-oriented databases did not have a standard programming language, right? They all had these one-off programming languages that were only specific to their database system. And so if you want to switch from one system to the next, you'd have to write your entire application. So these systems never really took off. Again, they're still around, but nobody actually, as far as I know, no one is actually working on them and it's using them other than maintaining the existing code for these particular reasons. All right, so then we get to the 1990s. And for better or worse, I'll call this the blurring days. And it's a bit of a smart comment, but basically, if you told somebody in the 1990s that you were gonna get a PhD in databases, they'd be like, why, isn't that a solved problem, right? And there wasn't really any major trends that occurred that were fundamentally changing how we view databases or how databases systems were being used, right? So the only major events that sort of happened was Microsoft got, was previously reselling Sybase for Windows NT, so they bought a license to it that allowed them to actually modify the code and they remarketed it as SQL Server. So the SQL Server that exists today is based on the original Sybase code that they got in the early 1990s. And now SQL Server has diverged so much like it's no longer remotely the same. And SQL Server is actually very, very good and very student-y art where Sybase has sort of languished. Sybase had a new version that came out last year for the longest time, I think they're sort of stagnant. My SQL got rewritten as a replacement for an M SQL and that came out as a source. Postgres, the after, Stomaker went off from a new company called Illustra. They got bought by Informix and then Informix got based and almost went under and got bought by IBM. So I went to grad students at Berkeley, took the original Postgres code that supported Quell and they added support with SQL and they put out as the PostgresQL and that's the version that everyone uses today. It's based on that, that derivative code that the Berkeley guys developed. And although this is not the 1990s, in the early 2000s, Richard Hit down in North Carolina started SQL Light which is the most widely deployed database in the world, it's amazing. As they said, the internet was sort of taking off but it wasn't really 2000s where things got really interesting. So 2000s come along, more and more people are online, more and more people are building these web applications and all of a sudden have a large number of users, a large number of transactions, a large amount of data and all the big players, the enterprise systems that I showed in the beginning, CyBase, Oracle, Informix, Ingress and all these guys, they were really heavyweight and really expensive. And they were not able to support the large number of concurrent operations that these web applications needed. And all the open source database systems at this time in the early 2000s, MySQL and Postgres, although they're very, very good now, back then they were lacking a lot of the core features that you would need or expect in a database system. Like I remember using MySQL 3 back in 1999 before they had NADB and you had to be careful because they didn't have transactions. You could lose data, right? And that's not good if you want to store data. So what ended up happening was a lot of these major web companies ended up writing either their own middleware to scale or shard the open source guys, right? eBay did this with Oracle, Facebook famously did this with MySQL, it still does. Or they ended up writing their own database system. So Google did this with Bigtable, right? And so basically what happened was people were saying the sort of the dinosaur or elephant database systems, the enterprise guys were insufficient for web or internet services with internet applications and they had to look to other alternatives. So then it also in the 2000s, we saw the movement of what are called data warehouses. And so prior to this, the enterprise systems were considered jack-of-all-trades, right? You could do transactions on them, you could do analytics, although analytics back then wasn't that complicated as we know it now. And so what came out was people said, well actually what you really want to do rather than having a system that tries to be everything for everyone and sort of do a half-ass job for all these different things, what you actually want to build are these special purpose database systems that are really, really good at doing one particular class of applications or one particular type of workload and then have separate systems that can handle other things. So you have one system that can handle OTP system, OTP workloads or operational workloads and then one system that can handle the OLAP analytical queries. So this really came into prominence in the 2000s with the rise of the data warehouses. So these are systems like the TZEL, Park Cell, Bonadibi, Green Plum, DataLegro, and Vertica. All these systems except for, I think DataLegro got bought by Microsoft and I think that got killed. Park Cell ended up turning into Redshift that Amazon uses or they're rewriting that now. The TZEL got bought by IBM, that's still available. Green Plum is now open source, got bought by EMC who then spun off Pivotal and that's actually now open source and available. And then Vertica, as I said, got bought by HP in sort of language but supposedly there's not a separate company that is running it and it's getting better. And MoDB is actually the open source academic system out of Europe. That's actually pretty good or really good. It's good. And it's still available today, it's one of the early columns for it. So the key characteristics of all these sort of systems that came out in this time is that they were all distributed, right? All of the enterprise systems I showed before were all single-node systems. For analytics, you wanna be able to, you don't worry about transactions, you don't worry about network connections and maintaining consistency across them. You wanna worry about scaling out your query across multiple machines and multiple disks and running as fast as possible. All of these systems were relational and support SQL and then unfortunately back then all of these were, except for MoDB, were closed source. Green Plum has since opened source but all these other ones that are commercialized are still closed source. But most of these systems were closed source back then as because there was a lot of money to be made in having a good data warehouse system. So then one other key thing too we discussed during the semester about these data warehouses is they use their column stores or use the the competition storage model. And that contrasts from the enterprise systems in the 1980s or even in the 1990s where everybody was with row stores. For analytical workloads that these guys were targeting a column store is much, much better. Then in late 2000s we saw the rise of the NoSQL movement. And again, these were operational databases that were worried about ingesting data very quickly from your application, from your users and being able to service them with low latency. So the major trends from all these systems is that they touted themselves as being schema-less or schema-last. So unlike in a relational model we had to define the crate table before you actually able to store any data into it. In a lot of these early NoSQL systems you just take your JSON object and shove it right in. It didn't check anything, right? And you didn't have to worry about defining what your table was going to look like ahead of time. You just started storing data and let it go. A lot of these were non-relational. So then you have the Docker data model, which as I showed before in the JSON example looks a lot like the object-oriented databases from the 1980s or key-value stores or graph stores or other things. They didn't support transaction, they didn't support joins. They had these custom APIs instead of supporting SQL, hence the NoSQL monitor. And then the nice thing about these guys is that they were usually open source. The exception of, I think Oracle NoSQL, which I think is based off Berkeley DB and then Google Bigtable and Dynamite DB, all of these are open source, which is kind of nice. So now the big thing actually is that they claim NoSQL does not mean not SQL, it means not only SQL. So a lot of these systems, these Cassandra, HBase, Mongo, Toma, they're never going to support SQL, but they might change that. A lot of these systems actually do support some dialect to SQL on top. And in particular Cassandra, they make it look a lot like the relational model. Then early 2010s, this is the movement that I was involved in when I was in graduate school. We saw the rise of new SQL systems. And new SQL systems were basically that their operational data stores like the NoSQL guys, they were worried about transactional workloads, O2 workloads. But then rather giving up asset and giving up SQL, they just had a modern architecture to allow them to scale out and get a high throughput and low latency. So all of these were relational and support SQL. All of these were, for the most part, distributed. And unfortunately, usually these things were closed source, but the exception of being H2O and BultiB. And this is GenFly, I think that's open source too. But yeah, all of these are closed source. And so again, the idea here was that these systems basically said, well, if you want the scalability you want in a NoSQL system, you don't have to give up transactions, you don't have to give up SQL. And that was their sort of main selling point. Most of these systems I think I'm showing here are still available. Yes, this is Spanner. Pivotal, I think I divested by Pivotal, or GenFly got divested by Pivotal. And I don't know whether just one DV is still in the market. All right, so then now, where we're at now, so this is sort of up until 2015, 2016. So where we're at now in history is that we're seeing three major trends. The first is that we're seeing the rise of what are called hybrid systems or HTAP systems, hybrid transactional analytical processing systems. So unlike before where I said in the 2000s, people recognized that you want to have specialized systems where you say I have my OLTP system here, my OLAP system there. What people actually want to do now is have a single database instance support both fast transactions and complex analytics all together. But I have to deploy multiple vendors or different services. So the idea is that you want to support fast OLTP, just like a new single system, but then all the complex queries you can do in a specialized data warehouse. So most of these are distributed and share nothing. All of them support the relational model and SQL. And then it's a combination of whether some of them are open source versus closed source. So Splice Machine, Palatom which is our system and Snappy are all open source and the rest are closed source. The other thing we see now is the rise of cloud-based databases. So we always had cloud vendors offer databases as a service. Like in Amazon, you can get RDS, you can get a single node Amazon instance. Basically what they did was they just took, these vendors basically took existing database systems, the Postgres and MySQL, and they just plopped it on a BM and set it up and configured it for you. But internally the architecture was essentially the same thing you would have if you ran it on premise, on your local machine. So the trend now is that people have been developing database systems that are explicitly designed to run in a shared disk cloud environment. Meaning the data system is aware that it's running in a virtualized environment in the cloud and doesn't assume that it has exclusive access to all the data or all the resources that it's using. So Snowflake, again, is a sponsor for the class. They're probably one of the most prominent OLAP cloud-based databases. Fauna is a commercialized version of the Calvin system. And Amazon has a bunch of systems like Redshift and Aurora that, I assume that's zero numbers in Israeli cloud-based database that went under 2013, but they were pretty early in this area. Microsoft put out Cosmos DB last year or two years ago and then Spanner is available from Google. And again, the big thing here is that again, you don't, it's not just taking existing code and running in a BM, you actually make it work for the cloud environment. So in the case of Amazon, they have it know for Aurora, they know that they're writing data in EBS and they can do some things to avoid having extra redundancies and get better performance. So where we're at now in the end of this decade is that we're seeing way, way more database systems, right? And so again, I'm keeping track of them in my databases to databases. We're seeing all sorts of different database systems to solve all different types of problems. We have shared this databases, we have embedded databases, time series databases, multimodal databases, big trend as of last year, blockchain databases, right? So this is just an example of the different companies that I'm aware of that are in this space, right? It's a lot of different things. And so in this case here, not all of them are supporting SQL, not all of them are supporting transactions and things we care about because they're not trying to solve those particular problems. It's just again to show you that, even though databases is an old problem, there's still a lot of interest in this area. Like Oracle has not solved this, right? If Oracle solved the problem, you wouldn't have all these companies starting up and trying out different things, okay? Question, yes. Does the last database have something related to big coins? What's the question, sorry? Does the last database have something related to big coins? Blockchain DBMSes? Yeah, there's one. So that's big chain DB. No, the blockchain isn't specific to Bitcoin, right? It's a distributed ledger where you don't trust the members of the network, right? We're not gonna cover blockchains and stocks, okay? Yes. Is that a decentralized DBMS? It's correct. And your question is, is blockchain, is big chain DB a decentralized database? Yeah. I actually don't know. I think they just announced like last month or so. Any other questions? Okay, so what are my party thoughts? The main thing I want you to get away from this history is that the innovations and databases both come from academia and industry. So one of the hard things about being a database researcher is that not only am I competing with other academics at other universities, both in the US and abroad, and not only do I compete against the big companies, the Oracle, Amazon, IBMs, Microsofts, I also have to compete against all these startups. They're putting on their own systems, their own ideas. So there's a lot of different things that you have to be aware of if you want to work in the space. So it's often the case that there's a lot of good ideas that first start in academia, but because of limited resources, a limited time, they write the paper about what they did and don't actually see it all the way through in the context of a real system. And it's not until a little bit later on a commercial company actually pick up the idea and you see where that actually makes sense. So Fawn was a good example of this. The Calvin paper came out in 2012, 2013, and then Fawn and E.B. started, I think, one or two years ago, that actually tried to commercialize the idea there. So in the 1970s, 1980s, IBM was definitely at the vanguard. IBM was definitely the leader in databases. I don't think that's true anymore. I definitely think that Google and Amazon and Microsoft are doing some really interesting things and their systems are very suited to the art. And then again, some open source systems are doing some interesting things that are cool as well. MemSQL, the HyperGuys at Germany, and then the Sinoflake as well. And my last comment is that, as we'll see throughout the semester, Oracle borrows ideas from anybody. And usually what happens is the startup or another company will come out with an idea and then you'll see it come out in Oracle five years later. And I'm not saying that it's like, oh, Oracle is slow. I mean, in their case, they're the most successful in this company. They have a lot of customers so you just can't have a rogue employee in the basement write some feature and put it out there. They go through a long testing process, right? Is that correct? He says yes, okay. Our friend here actually worked in Oracle, okay. The other major thing is that I think the relational model has definitely won for operational databases, right? There's so much inertia around SQL and relational databases that it'd be hard for us to come along and say, hey, I have a new data model. I have a new API. Because these can be hard for you to import their existing applications to use that. For analytical workloads or analytical databases, certainly SQL is very prevalent. Where the relational model falls apart is when you wanna start doing machine learning or any of that best operating on arrays. And although there's not very many array databases, there'll be more in the future. And I suspect that the relational model will have to adapt or have problems dealing with this workload later on, okay? Any questions? Okay, so next class, we will discuss, start talking, getting to the material in the course. We're gonna do a comparison between disk-based databases and in-memory databases. Again, this course is entirely based on in-memory databases. I'll finish off the discussion of Project One at the very end. And then as a reminder, the first reading assignment for the course will be due on Monday at 12 p.m. Again, there's a Google form online. You just go put your paragraph in there and submit that. I won't accept anything after 12 p.m. Everything looks a bit late. And then I'll send a reminder out on Piazza, but there'll be a recitation on how to do Project One on January 23rd, Tuesday at 5 p.m. on the night before. Okay? All right, guys. See you on Monday. Mm, I need something refreshing when I get finished manifesting. Too cold, a whole bowl like Smith & Wesson. One court and my thoughts hip-hop related. Ride a rhyme and my pants intoxicated. Lyrics and quicker with a simple moan liquor. Since I'm a city slicker, brainwaves are pick-up. Lives I create rotate at a rate too quick to duplicate filipines as I skate. Mic's a Fahrenheit when I hold it real tight. Then I'm in flight, then we ignite. Blood starts to boil. I heat up the party for you. Let the girl run me and my mic down. Well, all your records still turn to third. The green burn for one man. I heat up your brain. Give it a suntan to just cool. Let the temperature rise. To cool it off with St. I.D.'s.