 So, let's get started. This is the second lecture we're going to have on vectorized execution. So this is following up what we covered last class when we talked about the Columbia paper, how they came up with the different algorithms for executing the sort of standard operations or things that are internal in the database system using SIMD to vectorize those components. Now for today, we're going to sort of briefly look over other variations of using vectorization inside of a database system that can overcome some of the problems that the Columbia guys had or the assumptions that they make which aren't realistic when you build a real database system. So real quickly though, I want to make a correction on something I said last class. So last class and in the Columbia paper you guys read, when we discussed the Xeon Phi, we were discussing that it was used in order execution for CPUs with a ring based bus. So this was the Xeon Phi prior to June 2016. The codename for this was Knight's Corner. So in the Columbia paper they talked about how if you use a branchless sequential scan versus a branching sequential scan, you didn't really see any difference and actually the branchless one actually performed worse because it was using an in-order execution CPUs and also talked about how this had this ring bus where you had to go around to talk to the different cores. So in June 2016, Intel put out a new Xeon Phi called Knight's Landing and actually if you ever want to know why Intel always has these names like Knight's Landing, Broadwell, and the Halem, these are all actually like geographical things in like the Pacific Northwest. So Intel only names, they're like the codenames for their products are always based on like rivers and lakes around where the headquarters are in Oregon. And they do this because that way no one can sue them to say they stole their name, right? If you name it after a river, no one can claim that that was their name. So that's why there's always, you know, these sort of obscure names that don't really mean anything to what the product actually is doing. So the newer version of the Xeon Phi is called Knight's Landing and this one is now going to use a different architecture that's going to support out-of-order CPUs or out-of-order execution for instructions. And then it's also going to use a sort of tile-based architecture that we talked about in the very beginning of the class when we read the 1000-core paper. So the other interesting thing about the newer versions of the Xeon Phi is that you can actually now get it into three different form factors. So we already talked about how it can't, you know, the older version, you can only get it as this PCIe sort of co-processor device that sits on like the same position where your graphics card would go. But now they actually show you what are called self-booting CPUs and these are like where you don't need a regular Xeon processor to boot the system up, right? The actual Xeon Phi itself can run the operating system and run all your applications and things like that. So there's actually two versions of the self-boot CPU. So you have sort of, again, the general purpose socket one or the general socket one that we normally think of when we think of a CPU. But you can also get one with this sort of fabric interconnect that sort of allows the CPU to address memory in other machines in your cluster. Sort of thinking of this like as a finnaband, like going directly through CPU to CPU. And of course you can get the still the same co-processor that sits on PCIe. So the only thing that changes when we talk about last class is in some of the examples where they showed how because it had in-order execution in CPUs, certain optimizations that we made in the regular Xeons didn't actually make a difference. But now in the newer version, I think it actually would. So I think the numbers of these CPUs, roughly in like the $4,000 to $5,000 range, it's when you actually try to buy a quad socket or dual socket machine, it's not that much more expensive to go with one of the Xeon files. But again, they're still not going to be as powerful as a regular Xeon core. They're still going to be much more simple. But now the newer ones at least support out-of-order execution. Okay, so for today's class, we're going to start off talking about the quick step, the weaving paper that you guys read. And then I'll talk a little bit about what Hyper does and their newer version that can do vectorized execution on compressed data. And then we'll finish up talking about the code review that is due tonight. So you have to submit your code review to your partner group. And then you'll have a week to finish, go through the review and provide feedback to the other group. And so what I'll do at the end of the class, I'll sort of go briefly about what you should be doing in this code review. Just don't look at it and say, it's pretty, it looks fine to me. I'll have a talk about a checklist of sort of the things you should be looking at in your review. And this is the first code review. There's still gonna be a second one when we get near to the end of the semester when we get near the final code drop. But I think it's important to do this as a checkpoint now to make sure that everyone's sort of making progress with, and you're not gonna make any big mistakes in your code that's gonna prevent you from finishing it off at the end. Okay, so in last class, we, as I said in the beginning, we talked about these sort of vectorized algorithms from the Columbia guys. And we talked about how it wouldn't actually work without having 512, 512 bit assembly registers because of the, because you have to be able to pack in the 64 bit address, a pointer to the tuple, and possibly a 64 bit value. But when you actually now compress the data, you may be not storing everything as a full 64 bit value. You're still gonna have 64 bit pointers or you can't get around that. But maybe you don't need a full 64 bit value. And so, but now the problem is now your compressed data doesn't fit cleanly or doesn't align cleanly with your SIMD slots. So again, if you have a 128 bit SIMD register, you're gonna have four 32 bit slots. But now if you're compressing your data and to some lower bit, you know, bit amount, or bit length, say 12 bits, but now you're gonna have a 64 bit value and then a 12 bit, sorry, 64 bit pointer to the tuple and then a 12 bit value that you're trying to do the SIMD operational. And so that kind of, you know, that's not gonna, you can still do the computation you want to do in the SIMD registers, but now you're not getting full utilization of all the bits. You're still using all the lanes, but now like some of the bits we could be putting data into the SIMD register are being unused. And you also have this other problem where now we have to like, as I showed before, there's all these sort of scatter and gather operations where we need to be, we need some maneuver data into the correct form we need to put in the register. And then when it comes out, we may need to manipulate it again to put it back into, you know, a form that can then be processed or be used for processing in the other parts of the query plan. So the SIMD stuff we talked about last class was very cool and it's very useful, but there is a bunch of extra work you have to do to make sure your data fits into the register form that it wants. So this is the problem that the bit weaving stuff is trying to overcome. So bit weaving is an alternative layout scheme for columnar databases that was designed from the onset to have full usage of the SIMD registers or vectorized operations as much as possible and that you're going to be able to operate directly on compressed data, which as I said may not be the full 32 or 64 bits. So the way to think about this is the SIMD stuff before we talked about last class was about getting full utilization of the lanes, but now we want to talk about how to get full utilization of the bit positions in the lanes. So we're going to have what's called bit level parallelization. So in this paper, they're going to assume that they have order preserving dictionary encoding and that's not a, that's a fair assumption because we're dealing with sort of cold data, we can compress it once using a dictionary and don't worry about inserting new values till later on. The other key thing about what they're going to do is that they're only going to require sort of what they call ordinary full word instructions to get this bit level parallelization. So that means that we don't have to rely on the scatter and gather stuff we talked about last class, which is good because as we talked about in the regular Xeon processors, they didn't have hardware support for this. They actually had it emulated. And certainly when this paper was written in 2013, I don't think the Xeon 5 was even out yet. So this wasn't even an option for them back then. So this bit weaving stuff was implemented in a new storage engine that was developed at the University of Wisconsin called QuickStep. QuickStep was originally developed by this professor there, Gignes Patel, pictured here. He is actually probably one of the most badass people that I've ever met. He's very mild mannered. He's very calm, but he's sort of like that guy in, you watch that Kung Fu movies. There's always the guy that sort of stands back and then at the end he sort of comes in and wipes everybody out. He was telling me growing up in India how he had to get fist fights on the bus every day to get to school. So he wouldn't have met whether he killed a man, but he definitely said he's done some damage. So he is a true badass. He's awesome. Anyway, so this QuickStep thing is really cool because it came out of this constant database group, but now as of last year, now it's like an incubator project. Again, think of this as like it's not a full-fledged database system. They do support SQL, and I think they, I don't know if they do login in the latest version, but think of it as like sort of like WireTiger or BerkeleyDB or like LevelDB. You can use it as an embedded database system and then build sort of more complex things on top of it. All right, so the main two things I want you to understand about bit leading from the paper are these two different layout schemes that they have. Horizontal layout and vertical layout. Again, the key thing to understand is that we're talking about a DSM database. So we're talking about columnar storage of values, but within that single column, you can have two different organizations. So in the horizontal layout, it's going to be a row-oriented storage at the bit level, and I'll show what this means in the next slide. And then in the vertical layout, it's going to be a column-oriented storage, and the vertical one is going to look a lot like the bitmap indexes that we talked about before. And again, we're not doing this, like the key thing to understand is like we're laying out like individual bits of the values in a sort of clever way that it's going to allow us to get better vectorization when we want to do processing. So we'll go through each of these one by one. So let's say now for our horizontal layout scheme, so we're going to have a table with 10 tuples, 0 to 9, and what we're going to do is we're going to actually encode these, assume they're compressed, we'll encode these in 3-bit values. Assume it's something like something where you only need eight distinct values so we only need three bits to encode them. So in case here, here's all the actual scalar values that these things correspond to. And for all the things that we'll talk about in the paper, the code length can be any arbitrary size, and actually the word size of the process are going to be any arbitrary size. But for simplicity, they talked about using 3-bit encodings and 8-bit processor words, but you can expand it to as large as you want and the idea is I'll still work. All right, so what we're going to do is we're going to assume this is our single table and then we're going to break these up into two different segments. And we're going to store them in bit-wise, horizontally on a bit level per tuple. So what that means is that for the first tuple here, we're going to store all its bits contiguously in memory and then we're going to now store the for the fourth tuple here. So you can kind of think of this column here in the first four bits of these processor words, I'm going to have the t0, t1, t2, t3 and then loop back around into t4, t5, t6, t7. So the length of each of these vectors always has to be a processor word. And in our case here, we're assuming it's 8 bits, but it could be 32 or 64. And then the key thing to point out is although we're sort of going in a row-wise manner, the successive tuples go from the first position down through the vectors and then loop back through the top. So it's not like t0, t1, t3, t4, it's t0, t1, t2, t3, t4. The other key thing to point out is that all of the tuples in these bit vectors all are padded together with a delimiter of a single bit. And this is going to be important later on as we'll show when we actually start doing predicate evaluation, this extra bit is actually we're going to use to store the result of evaluating a predicate. They do talk about how you can have a... There is a variation of the same techniques I'll show in the next slide, where you don't need this extra bit and you can do all the computation in place. But this turns out to actually use more instructions. And so it's sort of this classic trade-off between memory and CPU. If you don't want to pay this extra bit per tuple, then you just have to do more computation to make it work. All right, so now let's go through an example of how do you take this horizontal storage layout and evaluate a predicate to see whether any tuples actually match. And again, for this, assume we're just doing large sequential scans on the database and it doesn't matter whether this horizontal scheme or this storage scheme is actually if we're using it for the index or whether it's the actual underlying table, the idea is all still the same. So let's say that we want to do a select star from table where value is less than five. So essentially what we're going to want to do, because we want to do vectorized operation, we want to compare a vector of tuples in parallel and see how many tuples and those vectors match. So for in our case, since a single vector only has two tuples, we just need to do... We're essentially doing two comparisons per vector. So again, so now we want to do... We want to compare to see whether the vector x is less than a vector of fives. So this is t0, t4 that I showed in the last slide and then this y here is just encoding of the constant five, because that's what we're doing in our predicate. So we just repeat this for as many tuples we have packed in our vector. So what they come up with now is a formula that allows you to do a vectorized comparison of the tuples to then produce a selection vector. And so the way they do it in the paper, they talk about all these different formulas that do less than, greater than, not equals and equals. And essentially what happens is you pre-compute some mask and then you apply this formula here. You take the exclusive order of the mask, add y, then end it with the negation of that mask. And then what you produce is a selection vector that's going to have zeroes of ones in these padding positions or these delimiters that tell you whether the tuple that proceeds or comes after that delimiter has matched or not. So in this case here, for this first position, one is less than five equals true because we compared one less than five, produce that answer there. And this one here, five is not less than six, so it gets set to zero. So what's really cool about this is that we're not even talking about Cindy here. What I just showed you can be done is some sort of ordinary instructions on the CPU which is taking two registers and going at each other. Nothing is actually using Cindy at all here. And so what's really cool about this is that to do this comparison now to compare two tuples to see whether they're less than some value, we only had to do three instructions. It's the XOR plus the addition and then end it with the negated mask. So in this case here, I'm doing eight-bit words so I can compare two tuples. But now if you imagine you had 64-bit words and maybe you're doing three-bit encoding, that would be four bits, so 64 divided by four. So now you can do, again, a bunch, you can vectorize evaluation for this predicate without using any Cindy at all. And so this works for any size, any encoding length, and in the paper they talk about all the different other algorithms to do the same kind of things or other things. So they didn't actually invent these formulas to do this evaluation in these vectors. This actually comes from a paper from Leslie Lamport from the 1970s, even before they had Cindy instructions. So I think it was kind of cool. So this is how to do this evaluation for a single vector. Now we can actually do the same thing across all these different vectors at the same time. So we'll do that same computation but to check to see whether it's less than five by extolling in the mask and adding y and then negating it to the mask. And then now, again, what you see is in all these delimiter positions, you have a one or a zero that says whether the tuple at the different offsets in these vectors evaluated true or not. So now what we need to do is we need to figure out how to way to combine all these selection vectors, so these ones and zeros in these delimiter positions. We want a way to sort of combine them into a single vector so then we can use that and pass it along to the upper levels of our query plan tree. So the naive thing to do is just take a for loop and just spin over everything a bit position and see whether there's a one or a zero or not. That's obviously going to be slow. Depending on what kind of query you're actually doing, you may not even care about converting these into a single vector. If this was a select count star, it was count the number of ones in each vector, which you can do with a hamming code, which is a single instruction, so that's easy. But our purpose is to assume since we have a star, we need to be able to use these tuple positions to go materialize the full tuple to produce the output. So this is also really easy to do too, because all we need to do now is just do a bit shift to slide over each of these vectors that were produced in here to shift them over by how far they are down in our list to now slide them over to the right, and then we just collapse them to produce our position vector here. So another way to think about this is, again, I'm not showing the SIMD instructions, but you could do all of this in parallel in SIMD. You can do all of this in parallel in SIMD, and then this is done as a single instruction to go to put it back into this form here. You're building on top of the SIMD stuff we talked about last class to now actually do to get bit-level parallelization. So this vector that I'm producing at the end, the paper talks about it, I think they talk about it, they might use a different term, but the standard parlance sort of come from vector-wise in the hyperguys is that this is called a selection vector. And it's a bit mass that's going to specify which tuples at different offsets in our input vector satisfy the predicate and therefore they should be passed up into our query plan. And as I said, there's basically two ways to... You need to be able to convert these bit vectors into the actual positions, and there's two ways to do it. You can do the naive for loop that we talked about before, where you just spin through every single position, check to see whether it's one, and if it is, you know that position should be set up in the query plan. Again, this obviously would be very slow, even if you unroll this, you're still doing the instructions to do the bit comparison. So a better approach, and actually this is what we use in Peloton, but also comes from the hyperguys, is you use what's called a precomputed positions table. So the idea here is that, and for this example, I only have eight bits. That means I can only have two to the eight different values that this thing could represent. So that's not very big. So that means two to the eight different position lists for my tuples. So what I'll just do is precompute all those two to the eight different position lists, and then whatever I have here, whatever this corresponds to, when I represent as an integer, then I jump to that offset in my position table, and then I produce my position list that way. This is really small. This is not something... You can pack this in your L1, L2 cache. It's not going to be in this giant single table, right? And you can precompute this once and just load it in whenever you actually need to convert these into the offsets. So that's a quick way to solve that problem we had before. How do we convert these selection vectors into actual positions and then we can pass up under my query plan, and that query plan can then use that to go materialize the full tuple or do whatever else it wants to do to process the query. So that's sort of clear. All right, so the other storage scheme that they propose in the bit reading paper is this vertical storage. And again, this is going to look a lot like the bitmap indexes that we talked about earlier in the class. The basic idea is that we're going to store all of the bits at a single position across all tuples together in contiguous named memory in a segment. So for this segment here, we're going to sort of color all of these first positions as white. The next ones are all sort of lighter gray and the last ones are all darker gray. So then when we actually store it in memory, all of the first position goes in the first vector and then the second vector contains the second position. And this just still has to be the width, the length of our processor word, because that's how we want to operate on. That's sort of the lowest granular we want to operate on in our instructions when we execute the queries. All right, so now we want to do an evaluation. For this one, I'll keep it simple. I'll say whether some key equals some value. So it's going to be just like we did in the bitmap indexes where we can go bit by bit in the position list or in our vectors and check to see which tuples actually match. But now because we know about SIMD, we can actually use SIMD to make this go faster. So in the first case here, we say again this is still using a dictionary encoding. So the encoded version of this value is 010. So in the first position of the bit, the only thing we need to check is the first vector. So we just do a SIMD comparison of these to see whether one equals the other. And this produces now another selection list. So now when we go to the next position, we actually want to reuse the vector we produced in the previous comparison because we don't want something to evaluate to true at this position when the previous one may be evaluated to false. So it's actually a combination of all three of these to then produce the final vector that says what tuples match the first vector and the second vector based on our bit positions here. Now for this example here, none of them actually evaluated to true in the second vector. So now we just have nothing but zeros. So a really easy optimization or the early pruning approach they talk about is if you just notice that you have all zeros when you do a comparison at one vector, you know you can stop because there's never going to be a tuple that could possibly match in the subsequent vectors for the other bit positions. So you just stop your comparison right there and move on to the next segment. And again, we can do SIMD and make this all speed up. And again, it's a simple Hamming code just to count the number of ones in our vector. It's zero, then we're done. All right, so for the quick evaluation I want to show, they have this one TPCH benchmark or it's a microbenchmark query that's derived from TPCH that's doing something really simple. Like select count star from R where R is less than A and R A is less than C and for this you're going to assume that there's a 10% selectivity on that predicate. And part of the reason they're making this count star is because you don't actually have to materialize any tuples. It's just evaluating the raw performance of evaluating the predicate on the different storage layouts. As I said before, if you just have the count there then all you need to do is just count the number of tuples that match and produce the final one tuple result. You don't have to stitch anything together. So for this they're going to use a 10 gigabyte database from TPCH but it has one relation from that so it's only a billion tuples with a uniform distribution of values. So this is the only graph to really look at in the entire paper and it sort of has their comparing the bit leaving horizontal and the bit leaving vertical against a basic SIMD scan or basic vectorized evaluation of predicates and then the naive scheme where there is no vectorization at all. So as expected you know SIMD will buy you about 2x performance for that naive scan but down here you see now you're getting 2-3x performance over the SIMD using these bit leaving approaches because there's essentially fewer cache misses because you're only operating on a small portion of the data. You don't have to deal with whole 32-bit integers every single time. You can use the compressed form. Then over here the reason why the bit leaving H ends up performing slightly worse than the bit being vertical is because of early pruning because you can't do early pruning in the horizontal representation. You can only do it in the vertical representation. So that sort of explains what this gap here is from. Yes? Isn't it because the horizontal version uses one more extra bit? His statement is the horizontal uses one extra bit so that's why it gets slower here. I think it also do it's the yes but also the you can't do early pruning with it. And I forget what this jump is here. So again this is I like this paper because it's just showing you an alternative way of storing your data in a way that's sort of not natural for us as humans or not sort of obvious for us as humans but is better for the actual CPU and is better for us to do vectorized instructions and we don't have to worry about that alignment issue that we talked about before because we can always have we can always try to pack in as many tuples as we can based on our coding scheme into a single word and we don't have to worry about aligning it to our registers. Okay. All right so the bit bleeding stuff I talked about before is essentially building a bitmap index said it can be for a bitmap or so you can be for the index can actually be the underlying column store like the storage for the database it doesn't matter but at the high level it's just a bitmap index but again they have sort of clever ways of doing arithmetic on it to do query evaluation. So this is great for OLAP because we're doing long sequential scans on the database as many tuples as possible when we do this selects but it's bad for OLTP because we need to do point queries to go find single tuples. Now again we'll talk about this when we talk about larger than memory databases but in two weeks but in general what happens is as data gets colder you then end up compressing it and then moving it to maybe a slower storage device and in that case you're just doing when it's cold you're just doing sequential scan but occasionally somebody might come along and say I need to access that single tuple that's in your cold storage and then if it's using one of these bit leading approaches one of these more aggressive compressing algorithms is it would be very hard for you to jump to the single tuple that you actually need without decompressing everything. So the hyper guys are looking at this problem here how can we compress the data in such a way that will make it still efficient for point queries. So I'm not going to really talk about what they actually do but I'm going to talk a little bit about how they can still get vectorized execution on on compressed data even though they're not going to use query compilation so there was sort of a way to think about this next part of the lecture is that we've talked about compression before we talked about query compilation we talked about vectorization now the hyper guys are sort of throwing all those things together in a single mix and they have to come up with trade-offs for each of the different approaches and they have to come up with a way to get the best of both worlds for all those three different things so what they're going to do they're going to store all hot tuples and uncompressed data and this is not surprising as you run transactions you update the database, you make new changes you don't want to compress them right away in the middle of a transaction because that will slow down the transaction so you want to sort of wait for the transaction anymore and then go ahead and compress it so then they'll call these the cold tuples and then what they'll do is they'll compress them in these data blocks and because they're doing this compression offline they can go in the data block and they can figure out what actually would be the best compression scheme for that particular data item should it be dictionary encoding should it be RLE should it be delta encoding and then they'll go ahead and compress in the same form such that you can never go back and modify it that's okay because they're doing MVCC if you need to modify a tuple then you just mark the old one that's in the compressed block as deleted and then you go insert a new version somewhere else in the database they're using the appendable storage for this so the tricky thing now is that they still want to get vectorized execution on these compressed data blocks and they still want to do query compilation because that made a big difference in performance but now the problem is if you have all these different storage layouts in your data blocks and you have all these different compression schemes you could use that means when you do a sequential scan on say the entire table as I showed before you have to generate the code that knows how to handle all the different formats of the different data blocks and that means now the code you generate for your query is actually going to be really really big there's a giant switch statement that's going to say if the layout looks like this I know how to decompress it, do my scan if the layout is like that, I can depress it into that scan and so what's going to happen is that's going to increase the size of the code to generating it makes it harder to do vectorization because all the different data blocks are in different formats and it's going to increase the compilation time for our query because we can generate the LLVM IR but then we have to compile it into machine code the more IR we have obviously the longer it's going to take to compile it so what they do which I think is pretty clever is that they actually split the execution of a query plan into sort of LLVM and sort of the standard hyper LLVM stuff we talked about before and then these vector-wise vector-wise vectorized primitives that are pre-compiled when the system starts up so the basic idea is that you're going to have these pre-compiled primitives for every single type of format you can have and I think they talk about 50 different formats they could have so they have to have primitives that can operate on all of these and then then they'll have the for the uncompressed data they just use LLVM to do compilation for those queries in just time and they can process that data very quickly and they don't have to generate all the different code alternatives because the uncompressed data is always going to be in the same format so they have a way of sort of stitching these to operator things together or these execution models together then they produce the final result so they're doing the vector-wise sort of primitives on some of the data and the LLVM stuff on another data and they produce the final result and I think this is pretty clever and the way they're going to get vectorization is because they can, they write the code in the same way that vector-wise guys have done then they can pass hints to the compiler and they can go ahead and vector-wise those instructions with using SIMD so again I just want to go over that just to show the hell of the different trade-offs of vectorization and compilation and compression you just sort of can't get everything automatically all at once it would be sort of a bit clever about it and the main issue that the hyper guys are trying to overcome is this because you can have all these different storage formats the code you're going to generate is going to get really, really big so the main take away from the vectorization stuff is that in the same way that we talked about in for query compilation we talked about how the way we would sort of naturally write our programs or the query execution engines is not the best way for actually the processor to do it and in the same way the way we sort of, a natural way for us to write our storage engine or storage manager for our database may not be the actual best way for the CPU to crunch on the data and in the bit weaving case we saw that if you're clever about how you pack the bits for your data and your tuples then use vectorized instructions to get better performance to get better parallelization alright so any questions about vectorization or the bit weaving stuff okay so I'm going to finish up the rest in the class one early and talk about what's coming up in the rest of the course so as I said the first curve review submission to your partner group is due tonight at midnight on Thursday we'll have one on one status meetings in my office if you haven't filled out your marked time in the calendar on the spreadsheet online go ahead and do that and this will be a 20 minute presentation where you do your eyes to sort of talk about where you are at with your project what are some problems you're having and what you plan to present to the class on Tuesday next week when we have the in-class project status updates and this will be just like before where you get five minutes and you say what you guys are doing and how far you've gotten so far and then also do on next Tuesday you'll have to complete the code review of the other group to provide them feedback on the code that you looked at for them so what I want to do now is I want to spend time talking about what's expected for you for this because this will be included in your final grade for project three I'll be looking at to see how everyone participates in the code review and I'll provide feedback for you to say yes that's a good comment, it's a bad comment did you think about this can you do a more thorough review when you do the second code review at the end of the semester right so the way this is going to work is I've already sort of signed each group into with a partner group and that you're reviewing each other's code so what you want to do is you want to submit a pull request to the CMUDB Peloton branch and the reason why you want to do this is because we already have Travis and cover all set up so when you submit your PR it'll automatically run and try to compile your code and actually run the test and show you code cover statistics so we're not going to merge anybody's code I don't think at least right away but you want to make sure that your code actually can be merged with the master branch and make sure you want to rebase and get the latest version of all the fixes that we've gotten so what will happen then now the reviewing group can come along and initiate a review request and provide feedback for the PR and to make sure that everyone knows where to find all these different PRs on the Google spreadsheet online you'll see there's a slot that says here's where you put the PR URL and so when you submit your pull request you also want to email the other group and say hey the PR is ready go ahead and start looking at it so I'll just say also too it's really important that everyone should be sort of congenial and courteous during these code reviews don't be sort of writing skating things like this code is all garbage because I know some of you are graduating but a lot of you are coming back next fall so if you're kind of like a dick to other people you're going to see them next fall and it's going to be awkward so just be helpful try to do a good job don't be a dick I guess that's all I can say alright so for the code review it's more than just submitting the PR it's also I think every also group should also sort of write a very brief summary that says to the other group here's the data here's the files you should be looking at here's the functions that be modified the idea here is that maybe you ran like the code format on some files and therefore the changes that didn't actually make any real changes to the actual code so you don't want the other team looking at things like that so you provide them a roadmap and say here's the files we changed here's the function now that you should be looking at and that will make it easier for the other team to get started looking at your code so the general guideline from what I've read online is that what you essentially want to do is sort of do this code review in pieces and you want to sort of allocate uninterrupted 60 minutes to look at some portion of the code review or the PR that's the only way to look at about 400 to 500 lines of code at a time because that's sort of what the human is able to so the upper limit of what the human can actually understand what's going on in a large code base and so that is just saying sitting down and just looking at the code and arbitrarily walking through it and trying to make comments I want to talk through sort of a guideline or checklist that you should have with the kind of things that you should be looking at and this will give you direction to understand when you do the review what you should be looking for so the first set of items of questions you should be asking are sort of high level things that are sort of obvious does the code actually work and this means that they have test cases to see whether they prove the thing that they wrote actually does what it should be doing is the code easily understood can you actually read it and it makes sense somebody doesn't drop into assembly and does something weird I know there's two groups working on LLVM stuff so I sort of paired those guys together and that way at this point those two groups should be familiar with LLVM so they should have no problem reading other LLVM code we want to avoid things like duplicate code where people take a single file and for whatever reason they copy and rename it to another location or another function you want to avoid all those things is the code modular meaning you have classes, there's a clean definition between the different APIs, the different functionality it's not just one giant function or one giant class file that does everything this shouldn't really be a problem so much anymore it was a problem last year because we still had remnants of the Postgres code and Postgres code has global variables everywhere but we're using C++11 so global variables you should not have any global variables everything should be at least encapsulated in classes so we have correct scoping for things I should also add to to make sure everything is in the correct namespace there's a simple one line bash script I can send out to make sure this values is true if there's any large portions of commented code they should just be removed, it's unnecessary it sort of dirties up the source code and then are they using the proper debug log functions and I think again we already have checks to make sure that you don't use printf or cout so when you submit your PR validate or make sure that you don't have these things so some of this can be automated you're laughing as if you have a lot of printfs yes? oh she does, you're pointing to her? no okay you saw this in the first two projects we have debug macros to make this life easier because the worst thing to happen is you can't find where it comes from and it takes a long time to figure it out it sucks there's documentation other than the summary I don't expect you to write a manuscript that says what the code actually does but there should be comments in the code that describe what the different functionalities are doing with the workflow or the control flow of different operations so you'll make sure all the functions are commented if there's any sort of unexpected or complex computation or behavior you're trying to implement make sure that's documented and describe what the expectations are for things coming in if people are using a third-party library make sure it's documented to say one actually make sure it's actually in the build scripts and the packages setup script but in the code if we rely on these third-party libraries to document what it is that you're actually using from them and obviously if there's any incomplete code like to-dos and things like that just make sure it says what's actually missing and this would be really important for those of you that have talked to me about doing a capstone project independent study in the fall when you finish your project in the semester and we merge it into the master branch you're going to go away to Facebook, Google, LinkedIn whatever all summer, you're going to eat all the free food and then you're going to come back in the fall and not remember what the hell you were doing when you left off so I think it's really important if there's parts that you know that aren't finished what's missing and that can at least provide you a roadmap after the first code review or even come back in September where you left off what's missing and then lastly testing is really important so you want to make sure that they actually have real tests that actually check the functionality of the system and real test means that you actually have a search or expectation in clauses in g-test to actually make sure that you're getting the correct output a bad test does it print test something and I just check when I run to see whether it gets the output that I want don't do that because then no one's going to run this test and check it because all our tests are automated are the tests actually testing the feature that they're trying to add or work on are they relying on hard coded answers sometimes this is hard to avoid to the extent that you have a hard coded answer make sure this is actually reasonable 1 plus 1 equals 2 you can hard code that but other things that may depend on the ordering of some output you probably don't want to hard code that and then when you submit the PR also into our github account when it runs in Travis Travis will spit out the code coverage numbers and you want to see that the code coverage for the test actually goes up someone drops down by 5% in the code coverage metric that means that they're not testing anything that they've added I think currently we're almost at for the master branch around almost like 70% in every new feature we add this is going up more and more obviously 100% would be ideal but that's actually not really realistic because we can write stupid test cases that don't actually do anything to increase our coverage so this is sort of like it's a tradeoff actually writing real tests and increasing the coverage it's a balance between these two points alright so any questions about what's expected for the code review and as I said the part of the reason why I want you to post the URL on the Google spreadsheet because I will go through along with Dana just to make sure that people are participating and actually finding obvious things or finding providing feedback to the other group in a positive way alright then as I said on this Thursday I'm going to meet with all the groups individually my office and the idea here is again just sort of give me an update privately to say where you're at with your project and what are some of the problems you may be having and hopefully if you're stuck on something we can figure out a way to help you and then on Tuesday next week you'll give a 5 minute presentation we'll go in reverse order that we did in the proposals and that way the people that were rushed at the end will have a chance to do their full presentation and the idea for this the status update is just telling everyone where you're currently at in your implementation and whether anything has sort of changed the initial plans or initial proposal and explaining why that was the case it was something not in the code that you thought was going to be there it was something that turned out to be just to be too hard just sort of update everyone to say what you expected to do by the end of the semester and then obviously anything that may surprise you or any problems that you encountered it would be good to talk about these things and have everyone be aware of it okay alright we're done okay alright I'll see you guys in class on Tuesday next week but then also Thursday and the 101 meetings okay take care