 All right, let's get started. All right, DD Droptables is back. Thank you. Did you solve all your problems? Yeah, most of them. I mean, yeah, girlfriend number two. We had a little pregnancy scare, but turns out it was just Abby Steve's. OK, good. We were all good. You don't want to be in that boat. It's awful. Yeah. Everything else OK? Yeah, pretty good. Are you still going to have three girlfriends or are you going to try to cut back? Yeah, I'm going to cut back just probably like one and a half. One and a half? What's the half? OK, all right. All right, so we have a lot of things to talk about today. So real quickly before we get into the topic for the lecture today, so this is the final docket for everyone this semester. This is everything that you have to finish up. Project three, obviously, was due last night. Homework five should go out today-ish. And that'll be due in two weeks on December 3rd. Project four will go out, win out this weekend. And that'll be due on December 10th. The extra credit is due on December 10th as well. And then we'll have the checkpoint, which I'll talk about in the next slide. And then the final exam is on Monday, December 9th, at 5.30 PM. Not in this room. I don't know what room they'll put us in. But it's sort of a shitty time. So maybe we'll do, instead of doing candy, maybe we can do pizza or something like that, something better. So any questions about any of these things? And then some other things that are floating around. We have three more, I'm going to call them real lectures, but course topic lectures on material that's this week and then one class next week. And then when we come back from the Thanksgiving break on Monday, December 2nd, our friends at Oracle will be coming and giving a talk about the stuff that they're working on. And again, this is not like a lecture where instead of me talking about the material, they're going to talk about the same material. They're actually going to talk about what they're building in their group. And you'll see how it ties in together all the various things we talked about through the entire semester. The other thing, the second class in the last week, I do two things. One, we'll do a final system review. And the second one will be what I'll call systems potpourri, where if there's any system you want me to talk about for like 10 or 15 minutes to teach you about what it is, how it works, and why it's interesting, using the vernacular that we talked about through the entire semester, we'll have a vote online at this URL, just a Google form. And you go select whatever system that you want me to cover for 10 or 15 minutes. So we usually have time to do 304 of them. So the list that I'm showing here is the, when you go to the Google form, it's the systems on the dbdb.io website that have had the most views for the last two months. So they're in that order, but you don't necessarily have to follow that. But also, there's another one that's on dbdb.io that we would have covered that's not on that list. You can type it in. And you can go back last year and see what I cover, but I encourage you not to do that before you vote, because you don't want to sort of take your bias to just do whatever people did last year. Because I'm always curious to see what you guys are interested about. We've covered Postgres a little bit. We've covered my SQL Oracle and SQL Lite a little bit. The answer is to see what you guys are seeing on the internet or what you want to do at your job or on a hobby project, what system you've been thinking about maybe using. And I can come teach you about what it is and how it works. OK? And then the extra credit feedback. Again, you can submit your extra credit article this Sunday on November 24. And then myself and the TAs will give you feedback and say what you're doing correctly, what you're not doing correctly. And that way, you can fix it up in time for the submission so that you get full credit. OK? Any questions about any of these things? So I'll put a deadline for when you should go vote. You obviously can't vote the day before the lecture, because then I don't have time to actually prepare it. The week of Thanksgiving will be the deadline for this. OK? And the last thing is that in addition to giving an in-class lecture on the second, that should be December 3. On the Tuesday, December 3, Oracle will also be giving a graduate level research talk over in the PDO in the CIC voting. I think they're also giving an undergrad talk Monday afternoon as well. So there's three Oracle talks in two days. One of them you'll require to come to or not require, but you'll get extra credit for the final if you come to that. And then these two additional ones are optional. OK? Any questions? We're almost done. All right. So today's class is now the beginning for our discussion on distributed databases. And as I said last class, we can't actually, you know, before we jump immediately into distributed databases, we had to spend however weeks we've gone so far in this semester to understand how a single-node database system works. Because when now we start going distributed, you know, just because we have more machines or more hardware doesn't magically make our system easier to build or better. All the things that we had to talk about for a single-node system, we had to still solve them in a distributed system and actually, they're even harder because now you have to account for the network. So we've already talked about this before. When we talked about query executes, this contrast between a parallel database and a distributed database system. And when I talked about parallel database system, we were just assuming that the database system was running on a single box that could have multiple cores and multiple CPUs. And then we assumed that the workers that were executing the queries could communicate very quickly with each other. And that communication was reliable. Because if you're running on the same physical machine, you're just sending things over the interconnect between CPU sockets. That's super fast. But now in a distributed database system, we still have to, you know, there's still the things we care about how to do parallel execution. But now we're doing this potentially across multiple machines. And so now we actually need to be mindful of what the cost is and the reliability of one worker communicating with another worker. Because if it's going over the network, that other worker might not be in the same data center. It might not even meet you on the same continent. And so now we can't assume that we send a message, they're guaranteeing to get that. And that's going to become problematic when we start to talk about transactions and other things. So as I said, for the distributed database stuff we're going to talk about starting today, it's building on all the things we've already talked about. So we still have to do logging. We still have to do concurrency troll. We have to do query optimization. We have to do query execution. We have to do joins potentially. All those things, we still have to do an distributed database. And now everything's more expensive. Everything is harder. So for today's lecture, as I said, today's sort of an introduction to distributed databases just to understand what they actually look like, the different designs of them, what are the implications of those designs. And then we'll talk about how to do partitioning, which is the key way we're going to divide up our database across multiple resources to get the parallelism we want in a distributed environment. And then we'll finish up briefly touching on how hard distributed concurrency troll is. And then that'll segue into Wednesday's class where we'll spend the entire day talking about how we actually do this. And again, we're still going to do base locking potentially. We're still going to do timestamp ordering. All those things we did on a single system still apply here, just now it's distributed so it's even harder, okay? And again, stop and ask questions as we go along. So the first thing we need to discuss is what is the system architecture of the database system? So as I said before when we talked about parallel systems, we talked about there being these workers that are typically tied to either a process or a thread that are running on CPU and they're going to access shared resources like disk and memory. And so the design of our database system in a distributed environment, depending what our architecture is, the variations of these architectures are going to differ in how you actually can coordinate the CPUs and communicate with each other as you're running queries or transactions in parallel and where is the memory and where is the disk located in context to the CPUs. So what we've talked about so far in the entire semester is what is known as a shared everything system. Assume this is a single box, a single rack unit that has CPU and CPU has local memory and there's a local disk that you can read and write to. And anytime I want to access something in the Von Nome architecture that we're based on, anytime I want to get something from disk, I got to bring it to my buffer pool, into memory and then my worker up above running on my CPU can read and write to things, read and write to pages and then I eventually write them out the disk. Again, most database management systems, well, every database system that's not distributed is using this approach, is a shared everything system. So an alternative in a distributed environment is one alternative is called shared memory. And the idea here is that you're gonna have multiple CPU resources that are potentially running on different machines, but there's a communication layer that allows them to have a unified memory view across all those machines, right? Assume this is some kind of high speed interconnect like InfiniBand or TCPIP, it doesn't matter, the high level architecture is still the same. And then there's still gonna be some local, or sorry, some shared disk that everybody's reading writing to. So typically the spoiler to be, this is actually not, actually I don't know of any database system actually that's commercially or open source that actually uses this. This kind of architecture is mostly seen in the HPC or high performance computing world. Like the people running on supercomputers at like the big national labs, they build software in assuming this model. For databases, there isn't that much. Another approach is to share a disk. And the idea here is that the CPU workers, the workers running on the CPU, they have local memory, but the disk where we have maintained a persistent state of the database, that's some kind of shared architecture, shared device that all these CPUs can read and write into. So think of this, if you're running on Amazon, this is something like S3 or EBS or HDFS, like any kind of distributed file system. So all the CPUs are still seeing the same disk, but in order for them to communicate with each other, they had to maybe send messages back and forth between them. Because this CPU can't read the memory of this CPU. The last architecture is what most people think of when they think of a distributed database, which is what is known as shared nothing. Meaning every single worker is running on an island by itself that has its own local memory, has its own local disk, and the only way to coordinate between different workers is to go up above and communicate using some kind of message fabric on the top. So again, this CPU worker here can't read the memory or disk from any of its neighbors or friends in the cluster. So again, we'll go through each of these one by one. So again, under shared memory, as I said, this is not that common in databases. I'm not aware of any system that's actually, people are using that's based on this. And the basic idea is that the database system is running on these different CPUs, it's running in the same operating system instance. And it assumes it has a single global address space that may be just aggregated across different machines. And then there's some networking layer that allows them to pass messages back and forth to make this work. So again, this could be in thinnaband, this could be TCPIP, this could be Intel's Omnipath, some fast interconnect between them. So in this world, the database instance running on one CPU, like the worker is aware of other workers. And so if they wanna communicate between each other, they can just do what you would normally do in a shared everything system. You can just write something into a global data structure or send a message over an IPC. And the other process of the other worker running on another machine would see that. Right, again, think of the context of a shared everything system. When we were doing Tuve's locking, if we wanted to tell another worker that, hey, I hold the lock for this tuple, I add an entry to my lock table that's sitting in memory. So same thing here. If one worker wants to acquire a lock on a tuple, just updates the global lock table and then the messaging fabric is guaranteed to make sure everything's coherent across all those workers. Again, as I said, this is not that common. I don't know that anybody actually does this. The more common one is shared disk. And again, the idea here is that we have these compute nodes that have their own local memory. They can have a disk up there as well, but that's not the final storage location of any kind of data in the database. You can just use it for caching in case you need to spill the disk on your local machine. But the database, the final resting location is down here. So if I say, I bring a page into my buffer pool and my local memory, say I modify it, and then I'm gonna write it out because it's dirty, I would write it down here to the shared disk. And now potentially any other worker can see my change. How you coordinate that will make sure that they're told about that change, we'll get to later. So as I said, this is the present architecture in today's cloud environment because the disk is gonna be something Amazon provides with like S3 EBS. So pretty much every sort of cloud native database system that you've heard about is gonna be running this environment. Because one big advantage you can get is that you're able to scale up the compute resources and the disk resources separately because the compute resources are stateless. The state of the database is down here. So if these, all these compute resources crash and go away, assuming I've logged things out correctly, everything is still here. And then I can just bring up another instance and pick up where the other guy's left off. We'll see in a second that's not so easy to do in a shared nothing environment because every node holds state. So let's look at a high level example like this. Again, so we have our application server. It's gonna send requests to these front end compute nodes. This is where we have the workers running on CPUs and their local memory. And then we have some back end storage device that everyone can read and write to. So let's say that the application says I wanna get record 101. It goes to this node. How it knows the application knows to go to this node will cover in a second, but assume it does. So then this says, some kind of look up that says, well, record 101, if I look at my index, I see that it's in page ABC. So I go to my shared disk storage and I say, get me page ABC. And I bring that to my buffer pool. Same thing, this guy wants 200. He doesn't have it in its buffer pool. So it goes out the disk and fetches it in. So now if I wanna scale up on compute resources because, again, the state of the database is always here on shared disk, I can just bring up a new guy here. I don't have to copy anything immediately because if I now request say 101, same thing. I just go to the disk and bring it back to my buffer pool. And I can serve the request. Tricky, sorry, yes. How does the lock manager work in the setting like this? This question is, how does the lock manager work in the setting like this? And then we'll cover this, yes, not yet. So we're just simplicity, we're talking about how we get things in now, yeah. Share memory, example, what's the difference between a multi-process and an outshared memory? Yeah, so this question is, in a shared memory architecture, how is that different than a multi-socket, multi-processor shared everything system? They're the same. So there are distributed systems that have a unified memory across multiple physical machines. So each machine has its own motherboard, has its own physical memory that it can rewrite to, but there's a layer that it says all the processors think they have this one giant block of memory. Nobody does that for databases, you see that in the HPC world. Like all those people are doing like nuclear bomb or particle physics simulations. They're writing those 4chan programs assuming they had this like, no, terabytes of memory across multiple machines they can do computation on, yes. For a system like this, does it mean that companies like Oracle, they have just one data center because of the shared? So this question is, if you assume in a shared disk architecture like this, if a database vendor, the data system vendor is using a shared disk architecture, does that mean that they either have to have all the data for a database in one location? No, we'll get to that. But again, I mean, think about this, there's an abstraction between the physical and the logical location of the disk. These guys don't know anything about where these things are, right? So like, it just says, hey, here's this file I can read and write to just as if it was a local disk. In case Amazon or Azure, you get like a block base or object-based API. Give me this bucket, give me that bucket. That's sending like a rest request to go to some backend service. You don't know where that data is actually is. Doesn't that mean it's a measure compared to when, you know, compared to between the application server and the node? So application server doesn't know which node it is. Yeah, so we'll get to that as well. So his question is, in my example here, when I'm sending this request, in this example here, the application says, I'm going to this node to get this record. You could have something in front of this that could hide that, or this thing can maintain to say where to go get the thing I need, we'll come to that. The thing I'm gonna focus on here is like, this guy has no state of the database other than what's in its buffer pool, but that's not considered to be durable or persistent. It's ephemeral. So this guy crashes anything we had in here or goes away. All right, so now the tricky thing is gonna be, if I do an update, right? So I update page 101. All right, so a record of 101. I have to update page ABC. These guys all read that same record. They have page ABC in their buffer pool, but they're not gonna know about the chains because these shared disk architectures, they don't provide a notification and say, hey, by the way, somebody updated this. So I have to have additional messages in these nodes to say, hey, I think you have page ABC, by the way, I just modified it and here's the latest version. Or if you ever wanna find out what the latest version is, come ask me about it. So that's all the stuff that we have to build in our database system, right? This is just reading, writing to some disk. And so related to his question, it's all transparent. So right now I'm showing the database in this diagram is on two disks, but I can easily add a bunch more to now split up the data across more disks. I'm getting better parallelism, better replication, better reliability, but none of these guys in the compute layer, they don't know anything about that because that's all hidden from me. So you have this nice separation where you can scale things out independently, but you're gonna pay a penalty in terms of locality of access because I can't, for the most part, I can't run queries over here. S3 allows you to do some basic filtering, but if anything, like a join, I have to do over here. So it means I have to pool the data to my compute nodes, yes? This is different from sharding, right? Because you just said that any of them can have page ABC, like in sharding you'll use that. So this question is, this is different than sharding or partitioning where you have explicit divisions of who has what data, we'll get to that. I'm just showing you what shared disk gets. At a high level, you can still partition at this level. There's nothing about shared disk to preclude you from doing partitioning at the compute level. Because you said that page ABC can be with other nodes. Correct, if you're not doing partitioning. The question is, in this example here, I said, when I update page ABC in this node, it sent a message to the other guys to update them and say, hey, I haven't made a change. Is it always like this, or what was the alternative? Maybe it's sharding. Yeah, so what I said in this case here, I update a page ABC, the compute node at the top has to update these compute nodes at the bottom. An alternative would be a way to do a push notification and say, hey, I just got an update ABC, by the way, everybody needs to update, refresh themselves. I'm not aware of any shared disk architecture, like EBS, S3, whatever the address of, they don't do that because that would be super expensive. Because to think about it, it's like a PubSub system. I need to know who needs to know about my change because otherwise I'm sending messages that are wasteful. So as far as I know, nobody actually does this. You have to coordinate at this layer here. And this is the database system that does this, the distributed file system, where the object store doesn't do that. Yes? Are you assuming that the communication between the nodes is reliable and fast enough? His question is, are we assuming here that the notification is reliable and fast enough? No. I didn't say what this is. I didn't say how we're doing it. I'm just saying that you have to do this. I guess you're on the issue of like stale reads. His question is, can't you run an issue of stale reads? Absolutely, yes. That's concurrent control. We'll get there. Yes. Okay. So again, the pilot, most people think about when they think about distributed databases is the shared nothing architecture where you have each node has its own local disk and local memory. And the only way for me to coordinate as I run queries is to communicate directly between my nodes. So if I wanna get data, if a query shows up that needs to access data on another machine, I can't go to disk and get, the shared disk can get it, because that doesn't exist. I can't read the memory from the other guy because I can't do that. I have to send a message and say, hey, I think you have this data. Either run this query for me and give me back the result or send me that data. And then now you get the issue of like, who should have one copy of what data, right? We'll get there. So this is gonna be the most hardest architecture to increase capacity and ensure consistency. That's the stale read issue that he talked about because I need to be able to run the system and move data around in a way where I'm not losing things. I'm not having false negatives or false positives as I execute queries, right? Otherwise I shut the whole system down, then move data around and add new capacity. But I don't wanna do that because my system will always be online. So now you say, well, this sounds hard. Why would I wanna do this? Well, the advantage you're gonna get over a shared disk system is that you're gonna get better performance and better efficiency. If the system is written correctly. Because I can now be mindful of the locality of data and try to move the least amount of data over the network as much as possible. Yes? When you say it's hard to ensure consistency, I think that's the right way to do it. Like, if you have shared nothing and you just basically partition everything into different machines, then you don't really have a problem of consistency. Yes, you do. I mean, it's not the same thing. It's not the single though, but it's not across the network. Correct. So, the statement is, if you assume that your distributed database is partitioned, which we'll get to in a second, if now I need to add a new partition and I need to potentially reshuffle data, depending on how I'm doing partitioning, I may have to move the whole database. I'm gonna move a segment of it. But again, I don't wanna have to stop the world while I move that. And so, depending on how much data I have in a single node, and I'm going with a network to some other machine, where's that machine, how long does that take? Right, it's, if I'm doing this, if I don't care about consistency, which we haven't talked about yet, then who cares? Just move data around and if you miss a read, whatever. If I do care and I am running transactions, then I need to be very careful about how I do this. And people get borne by this a lot. So, as I said, this is just a brief smattering of, or a very limited subset of the, some of the shared nothing distributed data systems that are out there. Most of the time for the NoSQL systems that came around maybe 10 years ago, they're all considered shared nothing. So, let's look at how this works. Again, so no longer we have a shared disk, on every single node, we have the CPU workers, we have our local memory, we have our local disk. And then now what I'm showing is, we've partitioned the database, or sharded the database into subsets, such that each node has some portion of the database. And so now I have explicit information about what data I'm having, I have at each node. So now the application says, well, if I want to get ID equal 200, it has to know that this node has the data that it needs. So, go ahead and get that. And again, now this is operating at the same single node, shared everything data we had before. Like it's not my buffer pool, I go to disk, bring it in, and then do whatever it is that I want to do to answer the query and return results. So if all your queries are accessing a single node, this is super fast. Because again, this is just a single node database system. The tricky thing is then, when you start touching data, that's on cross multiple machines. So let's say I have a transaction that says, I want to get ID 10 and get ID equal 200, like a single query wants to do this. So now I need to somehow get that data that this other guy has up here. But what am I sending? Am I sending the request to run the query? Or am I just gonna asking this guy, hey, I know you have this piece of data, send it up to me and I'll run the query up here. Right, that can vary. Now in terms of the scale out issue, right, under the shared disk architecture, I just bring up a new compute node. Every compute node is stateless. So therefore, it comes along and start executing queries and brings things from the back end shared disk into it's buffer pool as needed. But now in a shared nothing architecture, if I have to say bring up a new node, it now needs to get some portion of the database from these other nodes here so that I balance things out, right? So let's say that this guy is gonna send it, some number of tuples from this bottom partition. The other guy here is gonna send some number of tuples to this other partition and then once I know I've copied the data, now I update some global state to say, all right, well this node is now responsible for the range 101 to 200, this guy's 201 to 300, that guy up above is 1 to 100. And I was saying to his point, like this would be hard to do if I care about transactions and I don't wanna lose any data because I don't wanna have a query show up that maybe that wants to access ID equal 150. And I land here and the data hasn't been transferred yet so maybe I can answer but maybe it has been transferred yet and I go here and it says I don't have that data anymore so if it turns back nothing, even though it existed this node down here. So how to actually do this in a transactionally safe manner is tricky and not easy to do. Yes, in the back. How often does this happen? Can we shut down the database like once a month and then install all the data? His question is how often do you have to scale capacity? Couldn't I shut the database down once a month and add new nodes? But what if I wouldn't go the other way? So let's say it's singles day or Black Friday or Subur Monday. It's the one day of the year where I have a huge spike. That one I can plan. I know it's coming so I can prepare ahead of time. But let's say I have a flash mob, right? Everybody wants DJ drop tables new album all of a sudden. So also we have a huge spike in traffic that's unexpected. I wanna be able to scale up without having to shut everything down and scale up gradually. The older systems will do exactly what you're saying. Anytime you see any kind of financial website says we're down Sunday at 3 a.m., they're probably not running a distributed system but they're moving data around and they're doing maintenance things. But if you're on an online website, you can't do that. Yes. The question is, what's the advantage of doing this versus having a single node with like, instead of having two nodes, run these two, hold these two partitions, what have a single node? And you say, well, this CPU socket has this disk and this memory to run this partition and then another socket has this memory and this disk. Is that what you're asking? Yeah, basically like having this thing instead of like we could just with two separate nodes, having the same machine partitions in their mind but in the same machine. All right, so a question, yeah. Instead of having two separate machines that have disk, memory, and a CPU, what if I had one machine that just had the same amount of resources that I had split across two machines but now in a single unit? So the question is, what are the advantages of doing distributed database? So one is if you get diminishing returns as you scale up hardware vertically. So there's horizontal scale about is adding new machines. Vertical scale voting to take my one machine and adding more resources to make it more powerful. Going vertically is way more expensive usually and you get diminishing returns and there's obviously an upper bound how big you can make the machine, right? Let me give one example. In the early days when I was in grad school we visited PayPal because PayPal was running Oracle and they were freaking out that because every Christmas they would hit the, they were running Oracle on a single machine. They bought the most expensive machine you could buy from IBM, right? And you have to buy two of them because you need a hot standby, right? So every holiday season, they were freaking out because that Oracle machine was hitting the limit what the hardware can do and they couldn't buy a more expensive machine, right? So they couldn't scale any more vertically. So they were mainly moving portions of the database off like the humans are moving the portions of the database off in like November to the separate machines on the side just to get through the holidays and then they moved it all back. So in that environment if they had a distributed database system with cheaper machines then they say, oh, the holidays coming up, let me just buy or turn on a bunch of new machines and have the system scale out. That way, handle my high demand. Then when the demand goes down I can start turning them off and coalesce into smaller machines. So then, sorry, that's the only reason why the student, like student adventurers said to do is that you can have over a year they can put it on the amount and then on the amount and so they can scale out more vertically, right? Your question is, so is the advantage of a distributed versus scaling, horizontal versus scaling vertically is the advantage that you can scale out much more cheaply, horizontally? Correct, but there's trade-offs, right? Like, as we'll see as we talk about how we actually manage a distributed database system communication is now more expensive. I can definitely run faster if I'm on a single node because I don't need to coordinate between other different nodes and send messages over the network. But as I said, like you you can start to hit scale-related bottlenecks, right? The trend in data systems up into the 90s was to always scale vertically. The trend now is to scale horizontally because it's considered, you get better performance and for getting the better performance you pay less, you pay less. Is that always true? Yeah, I think that's conventional wisdom, that's always true, yes? Isn't it ultimately for a disaster that we like if electricity goes off and then if there was even a machine that would benefit in most of the opportunity? His statement is like, isn't it better for a disaster because again, if you're running like a five million dollar machine from IBM, you're not plugging it into the wall outlet, right? You have generators, you have backup power, right? But I would say that the issue really would be the network gets severed, right? If the network to the machine, even then you'll still have redundant nicks going into it. But even then, if you can't communicate the database, potentially on how you design your should be the database system, you could have the database spread across different data centers and then you could still be available. We'll discuss more of this on Wednesday, but this is one of the trade-offs you get between the NoSQL guys versus the traditional or NewSQL or relational database systems. The NoSQL guys were caring about availability so no matter what they wanted the website to be online and available. And so in exchange, they would give up transactions to make that happen. Because if you have to have new transactions, then the communication is more expensive. You need to make sure that everybody is up in order to make changes and they argue that was less than ideal. For some applications, I think that makes sense for anything financial that doesn't make sense. We'll cover the next class. Okay, so distributed databases are old. Some of the first ones were built in the late 1970s. Muffin was created by one of my advisors, Mike Snowbreaker, the guy who built Postgres and Ingress and Vertica and VoltiB. He had a system called Muffin that was a distributed version of Ingress. SDD1 was a, I actually thought it was actually a real system, it turns out it was just a prototype. They actually never actually had anything running, but there's a lot of seminal papers in the late 70s written by the great Phil Bernstein on how to build a distributed database and do transactions across them. A lot of the transactional theory that we talk about in this class, right, all that early work was done by Phil. System R star was a research project out of IBM. That was the distributed version of System R. That never became a product although there is a distributed version of DB2 today. Gamma was an influential system out of Wisconsin by Dave DeWitt. That was one of the first high performance distributed database systems. And then non-stop sequel of all of these was the only commercial distributed database system. And that was, Jim Gray helped build this. Jim Gray was the guy who was at IBM. It vetted like two phase locking and a lot of the early stuff that we talked about under System R. So non-stop was an interesting company. They originally were selling these like super fault tolerant machines. Think of redundant hardware. Space shuttle level redundancy. You have four CPUs running and if one goes down, the other three you can keep on running. So they would sell a database system that would sort of build on this architecture. It's still around today. A lot of financial systems actually still use this. And it's amazing how long it still runs. I guess it's non-stop, right? All right, so now that we understand what the architecture looks like, a lot of you have these questions that like, hey, how is this thing actually gonna work? How do I actually find data? How do we actually make sure that everything's consistent? So all of these things we need to be mindful of now when we build our distributed data system. And there's trade-offs because we're not gonna be able to do everything. So we're not gonna have our system to be guaranteed online all the time and make sure that we always support transactions and not lose any data or have inconsistent results. So as we go along, we'll see what these trade-offs are and why you're not gonna be able to achieve everything. The other big question we're gonna have is how do we actually execute the queries on this distributed data? And so I showed two examples so far. I showed the example on shared disk where the compute nodes pull the data from the shared disk system into their local memory and compute the result. And then in the case of the shared nothing system, we would send the query to where the data was located, run that locally, and then get back the result. So there's trade-off between how you actually wanna, whether doing a push or a pull. So the last thing to talk about too is what does the architecture look like in terms of what are the nodes doing in the cluster for the distributed database? And there's just two approaches. You either have a homogeneous cluster or a heterogeneous cluster. So in a homogeneous cluster, every single node in the database cluster can perform every single kind of task you'd ever have. So I mean that you could send a query to any single node and that node will figure out how to get the result that you're looking for. And they're all gonna be doing potentially background tasks and other things. So the advantage of this approach is that it makes provisioning and failover potentially easier to handle and support because now I just add new nodes and as long as I can move data around safely, I can add new nodes and the system gets stronger and gets better. In a up until a point, which we'll see next class. Another approach to do heterogeneous cluster where you can have specific nodes or members of the database system be responsible for separate tasks. And so now I can have to make a decision and say if my system is running slower, I wanna add new nodes, I have to know whether I should add a node for this type of node or this other class of node. I have to make a decision at that level. So give me an example of one of these architectures. I always like to use MongoDB because that's the most basic one to understand. So MongoDB uses what is known as a heterogeneous cluster architecture. So you have special purpose nodes that are responsible for doing specific tasks in the system. So when the application wants to send a request or execute a query, it always goes to this router. And so the router looks at the request and says, I wanna look at, I wanna get record with ID equal 101. These guys are stateless. They don't know about what any of the data is on the actual shards. So it goes to this config server node that it's responsible for sending out back the information about where to find data on these different partitions or these shards here. So that's all this thing does. This thing is responsible is like a global state for what the configuration of the system is. So now the router gets this routing table from the config server and then it can send the request to the MongoDB or the shard server. And then that's where it executes the query and gets back the result. So under this architecture, again, if I notice that, oh, my router infrastructure is my bottleneck, then I can scale this thing out and add more new nodes without touching the config server or the sharded servers. Yes? Can you give an example of what the new query is? This question is, what's the disable of the test? So like garbage collection, we talked about MVCC or building indexes or moving data around because I'm scaling up or scaling down. Again, like you can't send a query to this guy here. He can only tell you what the configuration of the system looks like. And this guy can't hold any data. He can only tell you how to send your, where to send your query. So the other thing we sort of briefly touched upon is about this notion of data transparency in a distributed database system. And that's where we don't want, ideally, the application to know anything about how the data is split up and divided or replicated across the different nodes in our cluster. So the same SQL query or whatever query language I'm using in my application for my database system, if I'm running on one node, that same query should still work and still produce the correct same result if now I'm scared out on a thousand nodes. Because otherwise, if I have a query set like a select star statement and then you have like some special thing that says where node equals 123, if 123 gets now split up across multiple machines or 123 goes away, I don't want to go back and rewrite all my application code. So we're gonna hide all the details from the application where the data is actually being stored, although we can push some information to the client level at a driver to allow to figure out what node he wants to go talk to. But in our application code, the Joe Schmo programmer should not know anything about how the data is split up, ideally. It's not always the case, but this is what we want. So now to talk about how we're gonna split the data up, we've already sort of touched on this a little bit, we're gonna use partitioning. I think we talked about this as well when we did, I think it was one of the types of ordering protocols talked about this and we talked a little bit about this with parallel execution. The idea here is that we're gonna take our database and split it up into disjoint subsets that we're then gonna assign to our different resources. If you're coming from the NoSQL world, they're gonna call this sharding, but partitions and shards are essentially the same thing. So now it's gonna happen if the database system is gonna get a query and it's gonna look at what data the different parts of that query plan need to access and then it may potentially need to send fragments of the plan to different nodes to go have them execute that part of the query and then send back the result that they generated. And we can use that same exchange operator we talked about before under the iterator model when we did parallel queries. That same exchange operator is how we can paralyze things in a distributed environment. So let's talk about how we actually wanna split our tables up. So the most simplest way to do table partitioning is you just take a single table and you have every single node, you have each node store one of those tables. So I have three tables, A, B, and C. Node one gets A, node two gets B, and node three gets C. That's the easiest way to do partitioning. Okay, for this one, obviously you have to assume that the table can fit on a single node, but for that, that's fine. So I have two tables, one and two. I just take all, again, all the tuples in table one goes to one partition, all the tuples in table two go to another partition. So the ideal query in this environment is any query that obviously touches one table. Because now I don't need to communicate through between these different nodes. I just send my query to this one node, it runs, and I send back the result. Again, I've got parallelism, assuming that my workload is easily divided across these two tables, but we obviously know that's not always the case. That's not realistic. So the very few systems will let you do this. I know MongoDB can. MongoDB, you can say, in their world they call it a collection instead of a table. You can tell MongoDB store a table on this one, on a single node by itself. But this isn't that common in other systems. Yes? Are these the shared nodes or are they? This question is, what are these partitions? Yeah. Doesn't matter. For simplicity, assume it's shared nothing. Actually, yeah. In assume it's shared nothing in a shared disk architecture, you don't necessarily have fine-grained control like this. You could, because you basically, you could just say in like S3, you just have different buckets for different tables. But you don't have any information of where it's actually being stored. So assume this is shared nothing. What is more common, what most people think about in an distributed database is to do horizontal partitioning. For us again, we're assuming we're doing a row store system. So for this one, we're gonna split the table up row by row by looking at one or more columns as the partitioning key and examining the value of those partitioning keys and then deciding what partition to assign it to. So again, in a shared disk system, sorry, shared nothing system, you do physical partitioning because every node's gonna have, actually stored locally on a local disk. It's partition. And then in a shared disk system, you would do a logical partitioning where you assign a compute node to be allowed to access a particular partition so that you know you don't have a copy of the same page across multiple nodes to reduce the amount of coordination you have to do. So let's look at a simple example like this. So let's say that we select this column as the partitioning key and we're gonna do hash partitioning, which is just we're gonna scan through and look at the value for every single tuple for this particular column and there's gonna hash it, mod by the number of partitions we have, and then that will tell us where to actually wanna go send the data. So now if a query shows up and it's like select start from table where partition key equals some value, we just take that value, run it through our same hash function and now we know exactly where our partition is. So this is hash partitioning. You also can do range partitioning where, which I've shown before, you basically say this contiguous segment of the value space for a column goes to this partition, then the next, we know 100 keys, go to this next partition, and then same thing, the query shows up, you look at the value they're trying to do a lookup on and I know where to route the data that I want. Go route the query to find the data where I want. Yes. So I'm just gonna consider if you are trying to question something that is across a table. Yes. So I think there are so many ways to question what the update will look like. So I'm just trying to find a way to make sure that you can see if you need all the ones that agree that you want to do this. So her question is, the, I'm just rephrasing your question. Selecting what partitioning key to use isn't actually an NP-complete problem because there's so many different combinations I could do. How do I know what to do? So this is something I actually have done research on. There's a, there's like a 40, 50 year history of people developing different methods and algorithms to pick the partitioning key. Again, my advisor's advisor wrote one in the 70s and he's dead, I wrote one, right? It's basically, it's like a search optimization problem. I look at my workload, I see how I'm accessing my queries, my query's accessing the table, and I'm seeing this thing, partitioning key, something equals something over and over again, then that's obviously the one I want to choose. For all to be applications, oftentimes you can, we'll talk about this in the next class, you can almost develop like a tree schema and identify, like, pass down to the tree that you then split everything up. So for example, like, say Amazon divides up its database based on state where the customers are located. So here's all the customers in Pennsylvania and then here's all the orders for the customers in Pennsylvania. Here's all the items that they bought in Pennsylvania. So I can take all of the Pennsylvania customers and put them in one partition. All the Maryland ones go in another partition. So a lot of times it's sort of obvious what that key should be. For OTP, for OLAP it's a bit more tricky. You definitely have to look at the queries, what the queries are, because again, you want to minimize the amount of coordination or data you're sending between the different partitions. Yes. Is this trying to say if we have some index, indexing on the same partition key, will it have some impact on this design? The question is if we have an index on the partitioning key, will this have an impact on the design? I mean, the selection of the partitioning key? What do you mean by design? Like, not selection. How does the, like, if we have this query come, you know, and we already have the index, how will it know that which part of the query? All right, so this question is, again, we'll get there. This question is, this is my query, my application sends us. How do I know that where to go, what partition has the data I'm looking for? Like, how does it know that, use this hash function to send the query? So, if it's a heterogeneous system, you could have a front end query router, like a Mongo did, it'd say, oh, I know the sharding key is this thing here, so let me go pick that out of the query, hash this value, and then I say, that's where I want to go. If it's a shared nothing system with a homogenous architecture, you could say, I land on P1, P1 says, oh, you want to execute this query, but I don't have this data, P3 has it, so it just rots your query for you. Or it sends your query down here, runs it, and then sends back the result through P1. There's different ways to do this. All right, so I'm showing hash partitioning here, right? We just take the hash value, modify the number of partitions I have, and that tells me where I need to go. What's the problem with this? Collision. He says collision, ignoring collision, assume we have a good hash value. So even if you have to do a sequential scan of 10 nodes, then you have to go to 10 different. So he says, yeah. Decreases kind of 10, it's continuous record. Yeah, so if you're doing hash partitioning, if you have to do a sequential scan, like if this is a range predicate, instead of a quality predicate, hash partitioning is a bad idea because I can't hash a range, just the same problem with a hash table. Something else? When you update the partitioning, then you have to be able to change it. So his question is, if I update the partition key, if I instead of partitioning on this column, I partition in this column, I gotta move everything around. Yes, but that doesn't happen that often, right? Like think about like your Amazon account ID, that they're not gonna say, all right, we're not partitioning on that anymore, we're partitioning on this other thing, your email address. That rarely ever happens. So if you have new partition, I repartition it. Bingo. So he says, if I had a fifth partition here, I have the same problem I had when we were talking about hash tables. Now you see why we have to talk about the single-node stuff first. If I had a fifth partition here, now if I rehash all the values in model by five, they're not guaranteed to be the same partitions. I may end up moving the entire database, but everyone might be swapping and moving to another location. So that's back. So we need a way to handle that. Who here has ever heard of consistent hashing? Very few, good, okay, perfect. So consistent hashing was a technique developed in the early 2000s, and the way it basically allows you to do it, allows you to do incremental updates and removals of partitions in your cluster without having to move everything around. So the way to think about this is that the hashing space is just a ring, zero to one. And so I'm gonna have, say, three partitions, A, B, and C. So the way to think about this is if I hash now a key and I don't model by the number of partitions, I just hash it and put it between zero and one, say I land at this point in the ring. So then I travel forward going clockwise motion until I find the first node that shows up. And that's where I know my data is gonna be located. So I hash it, I get a value, put it between zero and one, and I know that in between this and this is A, so the data I want is on A. Same thing over here, I hash two, I land here somewhere in the ring space, and I jump here to go to C. So again, the key space for all these guys is from where the partition starts, back into the next partition. That's fine, that's not so great. What matters now is that when I add new nodes, again, say my distributed database can't keep up with the traffic, I'm trying to support, so I wanna add new machines and scale out. So let's say I had a new partition here, D. So if I was doing the static hashing technique that I showed in the last slide, then I add now a fourth partition and I gotta rehash and mod by four now, everybody, and we have to potentially move all the data around. But the way persistent hashing works is that I add my guide into the ring here and now the only thing I need to transfer is whatever C used to have where D now is located. So it's just this part here of all the values that are in this partition that would be covered by this part of the ring, I send them down. And everybody else in my cluster stays where they are. So I can add new part, get new partitions and they just update the ring and add their new space in. And likewise, if I take a partition away, then anything here just goes up to where C was. So it's really interesting about this technique as well, is the way to do replication. Again, we'll cover more of this next class, but let's say I wanna do a replication factor of three. So for every single in-tape tuple I insert into my database, I wanted it to be replicated on three different nodes or three different partitions. So that way, if one of them goes down, I have two others available for me that can serve as a backup. And my database doesn't go down. So now, say I'm replicating A and I'm gonna replicate it on three nodes. So I have it on A, counts as one, then two and three. So any write to A, any key that was in A is also gonna be on F and B. So now when my query shows up, same thing, I hash this point in the ring and I can get it from either A, F or B. And it's guaranteed to be there. Assuming you're doing transactions we'll talk about next class. So this now actually gets into the consistency issue that we sort of glossed over when we talk about transactions before and talk about acid. If I do a write on A, how do I know that it's been propagated to F and B? Well, you have to wait until they all acknowledge that they got the right, which could be bad, because one of these guys could go down when I'm waiting for the acknowledgement and I'm stalling. Or I say I don't wait, but now I have this issue where I may do a write on A and then merely try to read that thing on B and I might not see what I expect to see. Again, so we'll cover this more in next class, but this is the consistency. This is the C in acid that I said we're gonna gloss over for single node databases, but matters in distributed databases. So consistent hashing is a really cool technique and it's actually used in some distributed databases. So the three most famous ones are memcache DE, which is a caching service. Cassandra and DynamoDB. Like DynamoDB I think had the first paper discussed in architecture using this. And then at Facebook, the one of the co-founders of Cloudera, he saw the DynamoDB paper, thought that was a good idea, started building Cassandra at Facebook. Facebook says we actually don't need this anymore and they decided not to use Cassandra. So then they just open sourced it and put it out there and then people picked it up and started making Cassandra actually be a quality system. So these are probably the three most famous systems that use the consistent hashing technique. All right, so we wanna talk briefly about what the distinction is between logical partitioning and physical partitioning. So again, the idea is the same, that you have this hash function or range function that allows you to divide up the database into disjoint subsets. But under the shared disk system, you have to do logical partitioning because you don't have control over how the data is actually being written to the shared disk thing. Amazon controls this, you don't. So basically where it works is that you have, you sign some portion of the database to the different compute nodes so that, again, the application server knows that if I wanna execute a query, here's the machine to go get the, you know, to run it. Right, likewise from down here, he's responsible for three. Shared nothing systems are when you do physical partitioning. Again, this is where you have the, each node is assigned the portion of the data that's managed by a partition. So again, same thing, I know how to get the data that I'm looking for from these different nodes. All right, so we have like 10 minutes left, so let's finish up and then that'll set us up for Wednesday's class. So when we wanna start executing transactions, this is when things get hard. And this is when things get expensive. This is why I see her question is, her question before was, oh, doesn't it always make sense to maybe try to scale vertically? Why would you ever wanna scale horizontally? There are gonna be, just as if there's diminishing returns, if you scale vertically, the hardware can't actually get any better because you just, you can't buy a machine that gets in, you know, this immediately faster. It's also assuming your software can actually scale and it's not gonna be plagued by concurrency bottlenecks and other things. If you now scale horizontally, then you're also gonna have diminishing returns and performance gains because now you're gonna end up with what are called distributed transactions. So if I have something that has to update data on a single node, we know how to do that. We've carded an entire semester about this and that's gonna be the best case scenario where my transaction that I need to touch data, it's all on a single node, I can run that without having to coordinate with anybody else. If I need to touch data across multiple nodes, then now I need a way to make sure that if I make a right here and I make a right here, when my transaction says commit, that it actually does commit because I need to make sure that all my changes are atomic and durable, just as I was in a single node system. And that's gonna get expensive because how do I make sure that if I say I commit, then everyone actually truly commits. So the way we're gonna do this is through a transaction coordinator. So you sort of think of this as like a traffic cop for the entire system to allow a way to determine who's allowed to do what and when it goes time to commit that everyone agrees that we're actually gonna go ahead and commit. So the two different approaches are to centralize or decentralize. A centralized one is where everyone goes to some centralized location that has a complete view of everything going on inside the system. And then it makes decisions about whether you're allowed to commit. And it is a decentralized approach where the nodes try to organize themselves and make a decision about, yes, this transaction made these changes and we're allowed to commit. And we can notify whoever else is involved in the transaction that they've committed successfully. So the very first version of one of these transaction coordinators was this thing called a TP monitor from the 1970s and 1980s. Nowadays, I think if you look at the Wikipedia article, TP stands for transaction processing monitor. Back in the 70s they called these things telecom processing monitors because these things are built for like the early, the phone companies back in the day because they were the ones that had most of the traffic and most of the data. So the way to think about this TP monitor is that it's the standalone piece of software that everybody has to talk to in order to figure out whether they're allowed to do certain operations on our distributed database. So the database system itself could be stored across different nodes and they don't really know that they're actually involved in a distributed transaction or distributed database. If you just take my SQL, whatever SQL node system you want, run that separately and then up above you have this TP monitor to allow you to figure out whether you're allowed to do certain things. So it looks like this, right? So we have application server, we have four partitions. So say we have a transaction with such these three partitions. So we're gonna begin our transaction by going to the coordinator and say, hey, we wanna modify some data at these partitions. We need to acquire the locks for them. Are we allowed to do that? And then the coordinator says, well I know what else is running the system because everyone has to go through me. Yes, well I see these locks are available so I'm gonna assign them to you and then tell you that you've acquired them. And then now the application server can go to the different partitions, do whatever it is that it wants to do to make the changes it wants to make. And then when it wants to go ahead and commit, it goes to the coordinator and says, hey, I wanna commit, I made these changes, are these partitions, am I allowed to do this? And the coordinator's responsible for going and communicating with these guys down here and say, hey, I think you know about this transaction because it told me it was gonna touch you, did it actually do anything? And then they come back and say, yes, these changes happened and they're okay or safe to commit. And then once everybody agrees, once the coordinator recognizes that everyone agrees that we can go and commit, we can send back our acknowledgement. Question? What scenario would it not be safe to commit to your operator? So the question is, I know what scenario would it be not safe to commit? So let's say I violate an integrity constraint here, my transaction aborts. I try to insert a duplicate key. The coordinator doesn't know what you did. It says, hey, I wanna acquire the locks on these things and I wanna commit in a distributed fashion. You have to go ask them whether that's allowed, they were allowed to do that. We are not gonna go with partitions. For simplicity, his question, are we luck in the whole partition? Simplicity, yes. Right, there's a, I think it's like the X8, there's a protocol that allows you to do more fine-grained locking. Just stick with partitions, makes this simple. Okay, so again, there's a bunch of, a lot of the enterprise software vendors sell you something that is a TP monitor. Oracle has this thing called Tuxedo. IBM sells this thing called Transarc, which actually was a CMU startup. Like the guys that did the AFS, stuff from the 80s, they did a startup called Transarc that got bought by IBM and IBM still sells this. There's a project, you can't really read the logo. It's called Apache OMID. It was built by Yahoo. It's a, it's basically a TP monitor for HBase, a NoSQL system that's actually used by a couple other systems today. So you can build a distributed database without worrying about transactions because you just rely on these guys to figure things out for you. And you just do all the single node stuff that you normally would. But probably more common is to use a centralized coordinator as a middleware where you have this piece of software that sits between the application server and the database partitions. All queries go through this middleware and the middleware is responsible for figuring out, oh, this query wants to touch this data, this partition. So it looks at it's global lock table, information about what partitions are there and it routes the queries as needed for you. So you look like you're talking to a single, single node database system through the middleware, but in the backend it's distributed and broken across these different partitions. So when the commit request shows up for the application server, the middleware does the same thing as the TP monitor does. It communicates with these guys and say, hey, are we allowed to commit? And only when everyone agrees, do you then send back the acknowledgement? So this approach is actually very, very common. Like Facebook is probably the most famous one. Facebook runs the world's largest MySQL cluster and they have a middleware system to do all this routing for you. Google used to do this for MySQL and the ads. There's a planet scale that came out of YouTube. But this approach is actually very, very common. You take a Postgres, MySQL, whatever, your favorite single node database system is and you build this little wrapper there in front of it. eBay did this with Oracle, it's very common. The other approach is, the last approach is to do the centralized coordination where you don't have a coordinator, you don't have a centralized view of what's going on in the system. The application server communicates with some home partition or base partition, some master node that's going to be responsible for this given transaction. Other nodes could be master nodes if you assume you're a homogeneous architecture. So you send all the query requests either directly to the master node or to individual partitions, it doesn't matter. But it's when you want to go commit, you go to the master node and say, hey, I made these changes, I want to go ahead and commit my transaction. And then it's responsible for communicating with the other partitions and deciding whether you're allowed to commit. And if yes, then you send back the acknowledgement. All right, so the thing that I glossed over is that part of how do we figure out whether it's safe to commit? Question? In the previous example, how do you take the locks? This question is, how do you take the locks? So it would be, say again, assume I'm doing locking the whole partition. So when the query shows up, right, you try to acquire the lock at that point. But what about the master node? So the master node would only know information and potentially about what partitions you touched. Doesn't know what you did at them, right? The application responsible is saying, hey, I couldn't get the lock at this partition, I have to abort my transaction. So you go back to this guy, hey, say I aborted. Alternatively, you just send all the requests at this guy and he's responsible for farming it out to the different machines. You essentially end up taking the lock at that point. At the master node? If you touched data at the master node, sure, yes. Okay, so we'll cover this in more detail in the next class, again, it's gonna impress upon you and then you'll think about it and see on Wednesday how hard it actually is. Say we're doing two phase locking, my last example, and say that my nodes are over the wide area network. One node is in Pittsburgh, one node is in San Francisco. So at the same time, I have two applications trying to update the database, right? At the very beginning, I get a lock on my node here for A, this guy gets the lock on B, but now I wanna update, this guy wants to update B, the other guy wants to update A. So now I gotta go over to the network and send a lock request to get the other lock on the other thing. The other guy's doing the same thing, I'm obviously ending up with a deadlock here. So how do I actually figure out who's actually should be allowed to commit? Because again, if I'm doing a decentralized architecture, if I don't have that TP monitor, but even if I do, I may not have to find great information about what exactly it's doing on each node, because you can't always know what the query's gonna do before you actually run it. Someone that needs to figure out, I have this wait for a graph of the cycle, I need to kill somebody. And then so let's say this guy says, oh, I'm gonna back off, I have a deadlock, if I'm doing doc prevention, I kill myself, this guy could be doing the same thing. So this is what we're gonna talk about on Wednesday. How do you actually do distributed concurrent control? How do you figure out, you know, take two phase locking, time stamp ordering, and run it in a distributed environment where you don't have a complete global view of everything that's going on inside the system at any given time. We're also gonna spend time on when my transaction says go ahead and commit, how do I guarantee that I commit everywhere? Because what happens if a node goes down when I'm trying to commit, what should I do? That's actually super hard to get right. So if you're interested in these kind of things, there's this great website called the Jepsen project by this guy called Kyle Kingsburg. So he was, he basically, he built this torture chamber for distributed databases, written enclosure, which is a bit gnarly, but he basically has this test suite where you can take your distributed database, run through these weird edge cases and identify that it's not always correct and has problems on guaranteeing reliability, or availability, or correctness of transactions. So right now he has a consultant company, people pay him money to go actually run this. So if you go to his website, he has these write-ups which are super detailed and take a long time to read and understand what he's actually talking about. But he talks about how these different database systems he's tried this against. They claim that they're, the transactions correctly, they claim that they can always support high availability or good performance, and his thing shows that they don't. So they pay him money to go run his thing on their database system, and then if they pass, they can announce that they're certified. So there's one database company, it was Aerospike, which is a distributed key value store. They used to claim on their website, they had strong consistency guarantees. He ran this thing against theirs, crushed it, showed how it wasn't, and they had to go back and change all the marketing crap to remove it because he humiliated it. So his website's awesome, his Twitter feed, not so much. You'll see why if you go look at it. It's not my thing, but he's a really sharp dude and I think there's a really good website. Right, next class, distributed OTP systems, replication, cat theorem, and then real-world examples. Again, we'll go through, start worrying about how we're actually gonna run transactions in a distributed environment. We'll talk about no SQL systems and see why they don't want to do transactions because it's gonna affect performance and availability. Okay? All right guys, awesome. See you on Wednesday. Oh yeah, coming through with my Shaolin crew. Two cent for the case, give me Sainah's crew. In the midst of broken bottles and crushed up can, met the cows in the gym, oh, I'll drive. It's with Sainah's in my system, crack another unblessed. Let's go get the next one, then get over. The object is to stay sober, lay on the sofa. Better yet, damn my show's gonna be Tim's. Dressed up, could never be son. Rick is a jelly, hit the deli for a pro one. Naturally blessed, yes. My rap is like a laser beam, the bulls in the bushes. Sainah's been the king team. Cracked up by the love of Sainah. Sipping through gold, you don't realize. You're drinking it only to be drunk. You can't drive, keep my peoples alive. And if the same don't know you for a can of pain, pain.