 So, we have multiple such rows, multiple such columns, so this is called a word line, this is called a bit line and these are called access transistors. So, the idea was that, so just so the way DRM is accessed is that you first access a full row and we have an array of sense amplifier here. So, you access a full row, reach out into the sense amplifiers and then you can access the required columns. So, after you send the row address, which essentially will be decoded and will activate all the word lines and from then on until you can access the column is essentially called the row to column delay or TRCD and then the time to access the respective the required columns will take time t gas. So, that is what we mentioned and we also said that as long as you have, so it also maintains a register here, which essentially records the currently accessed row, so that is the open row. So, whenever a request comes, what you do is you compare its row with the currently registered row here, if the match then you do not really need to activate any word line, you can just directly do a do the cast from the sense amplifier. So, you can eliminate this particular row here, that is what we mentioned last time. So, that is how you enjoy a first open row accesses to the delay array. The other thing that can happen is that the currently accessed row does not match with the current open row, which is what we said was that you need to close this row and open a new row. So, what really happens is that you precharge this bit lines at that time, so that takes an extra time called TRP that is row precharge. So, the question is why is it really needed this precharge operations. So, this sense amplifies actually what they are doing is that they are trying to, so whenever you activate a word line, the charge of the bit lines suppose to change by whatever come out and the sense amplifier job is to magnify that change to the right side. So, precharge operation what it does is that it precharges all the bit lines to the midpoint of 0 and 1. So, that is the stable state of the delay, where the delay is ready to be read. So, that is what this precharge operation actually does and then when you activate a word line depending on the charge stored here in this particular cell that will disturb this particular charge at each bit, because this is a capacitance actually which is storing the charge. So, very small capacitance, it will have a very small charge. So, from the midpoint the charge will charge on the midline will either swing a little bit on the higher side or a little bit on the lower side depending on the charge stored here. Sense amplifier job is to magnify that to the full voltage in that to 1 or to 0. So, that is what it is actually doing. So, precharge operation is actually ready in the DRAM array for the next row access otherwise you cannot really sense the voltage from the DRAM. So, that was about the DRAM details how you access and what the latency and etcetera. So, we spent a lot of time last time discussing these things. Any question on this? Yes, write offs happens in the same way you force certain charge of the bit lines and that get transferred to the capacitance here. So, this is a single transistor cell that is what I am showing here. So, there are many other updates of DRAM there are there is a three transistor cell also where you normally have separate word line for writing and pages. So, but this is normally used as the most dense design. Now, the major architecture actually goes here the most of the engineering how you really design the cells. So, that it can retain the charge longer because if it can retain the charge longer it can deal a refresh cycle. Refreshing as we discussed last time essentially involves reading out each row in the sense amplifier and closing that section. So, that takes time yes. So, why do we need to have sense amplifier? So, because the voltage change and the bit length was smaller. Yes, exactly. What is if we have. So, that may be because the capacitance in these cells is small. Oh, exactly. So, why can't we have larger capacitance. So, then we lose density. So, last time we mentioned that for DRAM the primary goal is density. So, how many bits connect back in unit area? Larger capacitance. Exactly. So, we will see one more thing when we go to SRAM which we discussed right now actually. SRAM actually has for each column two bit lines a pair of bit lines one is one one carries the actual value other carries the inverted value. So, that actually makes sense amplification much faster. Here it actually takes a little bit time to amplify the voltage, but DRAM is not much concerned about the latency. Density what matters. So, if you make a pair of bit lines density will even go down because it will take away these bit lines. You cannot pack as many bits in a second. Any other question? So, what I do is I will spend a little bit time explaining SRAM. So, they look very similar in architecture. So, these cells are like a joint cell. So, this one you know cross coupled inverters give you one bit memory cell right that you will have been studying some detail design goals because that is how flip flop is actually designed. So, this is a typical SRAM cell and then you have the access from the stress and the bit lines and the word lines is exactly same and this is replicated on this side and this side. So, that is an SRAM and of course, you have the area of sense amplification. So, here you can easily argue out that one of these lines I can if I call x other one will be x bar. So, here what happens is that the pre-charge operation pre-charges both the lines to one. So, they are both high high voltage and then when you activate a word line what happens is that one of them will essentially sway one of the lines and that difference is amplified by the sense amplifier. Now, here it is much faster because there is nothing much to say amplified. Sense amplifier will just leave the difference and register the current value in the row and here also there is nothing like a gas normally nothing like a gas in RAM because what you do is you read out the entire row. For example, when you design a register fight of say a 32 bit processor it essentially have 32 cells that will be one row. So, it always read out the entire row there is nothing like a column access in the row. So, there is no gas operation that. So, essentially here only the pre-charge operation that is needed between every pair of operations and there is no concept of row buffer or anything because there is no concept of column access. What you get out of sense amplifier is the value. And immediately you pre-charge that again for the next operation. And just to remind you how to multi-port a RAM array. So, we discussed this also last time. So, this is a single ported RAM in the sense that I can access only one row at a time. If I access both the rows the value will get carbon here on the bit line because this value and this value will essentially clash and you will get something some garbage. So, if you want to make it multi-ported you will have to add a dual ported you have to add one more access to all this stuff. So, you will have a new bit line also on this side you will require that and you will be there for all the rows. So, now you can not only access the same row twice simultaneously you can access two different rows also. So, this will be replicated. That is an SRA. SRA is used for designing your caches, register points, any on-chain memory that you can generate. Your pre-ported use, your order buffer everything. So, one small thing that is left slightly connected to this is. So, we talked about these sit-out city caches and fully associative TLBs. Where you essentially need to make multiple comparisons. So, the question that comes to mind is let us take the TLB example that is clearly more complicated. So, if you have a fully associative TLB just remind you what the structure was. We have a valid bit, we have a tag and we have a variable. That is the organization of a TLB and you have multiple such entries. So, what you do is if it is a fully associative array you would take the virtual address chop it into very offset and virtual page number and you compare the virtual page number with all the tags here. And hope is that most of them will match and if one matches the matching entry will give you the page table. The question is how do you really design such an array? It is not an SRAM theory because you are not really accessing one array you are accessing I am sorry not just accessing one row you are accessing all those. So, how do we do this? So, these are called content addressable memory or TAM for short. So, here what you do is you separate the tag and the data. So, data is designed as a conventional SRAM just like that the data array. The tag is designed in a slightly different way. So, tag is designed as a TAM and what it does is. So, this is a tag there is a bunch of rows here and this is the data is a RAM. So, this is less like this and you connect these things ok. So, what happens here is that when you get this tag you compare all the elements here. So, essentially what happens is that there is a line through which you pass this particular tag and there is a comparator in each of the cells in the TAM in each of the rows and this line tells me whether there is a match or not. So, this will be either 0 or 1 the value of this line and this line will be connected to the ward line of the corresponding row. So, any line that matches will activate the corresponding row in the data and we will read out the page template. For example, in the 10th row as the corresponding tag the 10th line will go high everything else will be low. So, we will read out the page template in the 10th row which is the corresponding row. How many tags does it store? How many tags does it store? Well, that is decided by your TLA arrangement right. So, 100 to 100 value 2, 4, 8 this one will be large. If it is large it is going to be bulky because each cell is pretty large which cell contains not only this bit it will also contain a line that will carry the external bit that has to be compared and there has to be a comparator also and based on the comparison it will give out the corresponding line. So, that is how you design this tag arrays you connect the ward lines of the data array with the comparison. So, what I will do is I will spend the remaining half an hour or so discussing some of the research problems in caches and memory which will hopefully be decapped because of what we did on the memory side. So, let us start with the caches ok. So, we talked about three main parameters in caches right. These are NDC parameters associativity log size of the past right. What else is there? The determines the cache performance there is a replacement policy there is an index function which decides which set index the cache should be inclusive or exclusive with respect to the next node level and there is something called a topology which I will explain now. So, cache can be roughly seen today as a associativity log size capacity a replacement policy and index function inclusive or exclusive in a topology. So, we know about these six things except probably the topology ok. So, what has happened today is that the caches have grown very large. So, you will routinely find on cheap caches which exceed tens of megabytes right. Now, the problem is that if you design such a large SNM array it is going to be very slow because you can guess what your bit lines are going to be long because they span a very large number of. So, in a cache one set is essentially one block essentially one row. So, essentially your bit line length will be roughly proportional to number of blocks in your cache although it is not exactly done in that way if we reorganize this slightly different way, but that roughly goes which is why as your cache width gets bigger it becomes slower right. Your bit line length increases and the time to read out the value from here is directly proportional to the length because that determines how long it will take to stabilize the data. So, today what people have done is that they have divided the cache into smaller chunks which are called banks for example, if you have an 8 megabyte cache it will probably design as 8 banks each of 1 megabyte so that if you can decide what bank to access in the small cache that will access just 1 megabyte so topology determines how you exactly lay out these 8 banks with respect to the processors on the chip. So, let us take an example suppose we have today we have multi-core processors on each we have multi-core processors so let us say we have 8 processors and an 8 megabyte cache with 8 banks. So, typically the way we design is that a processor will be given a 1 megabyte of cache 1 megabyte bank. Suppose you how do I connect it? You can think of it as 8 voices on a graph and you connect it in whatever way I want as long as it remains connected. That is what the whole is. One connection one possible connection could be a link another possible connection that we have mesh this is how I make it that is a 2 cross 4 mesh may be I have it is clear here but it actually looks like this. So, what are the implications? Suppose this processor wants to access a piece of data which is here you have a longer access compared to if it wants to access the data here. So, that leads to something called non-uniform access cache architecture. You should be able to appreciate that compared to a ring in a mesh the access latency is actually smaller. For example, this processor instead of going all the way here it can actually now follow this link directly. Short answer average is hop time. So, these are called actually a hop. So, this one will have a switch here. So, which will decide an incoming packet will go which way it can go either to the to this bank it can forward it can forward it along this direction or it can go like this. So, topology essentially decides your average access latency. That is that is the impact. So, is it clear everybody? Is that it? I am going to show distributed shared memory yes very much yes exactly. So, distributed shared memory these are essentially a local memory of a processor. The whole problem is now brought on chip and the only difference is that there each hop may be much more costly than the hop here. These are essentially short links switches are simpler and normally possible. So, what are the research problems here? So, we have to abstract away some of the details and you know extract some of the research problems here. So, one of the problems that is interesting is if we fix let us say A, B, C and T. So, let us suppose I take your cache and tell you that this is A, B, C, T. What is the best it can do? So, then essentially our degrees of freedom are R, F and I. So, we can decide whether the cache should be inclusive or exclusive. So, we have discussed that the implications of that in tradeoff between inclusive and exclusive. So, we can design smart index functions. So, what are the implications of index functions? Can you guess? What can you find? Why should I at all spend time designing index functions? Speed of access. Speed of access. Can you give an example? Resolving the address at that time of the dissolving the address. What is the implication of the index function? It should be uniformly distributed what should be distributed with the row address whatever I mean whichever row we are accessing we get an address we equal the row from row. What is it called? A set. Set. We equal the set from it. Now this sets should be uniformly distributed. Sets should be uniform distributed. What is that? The value of the set should be uniform distributed. What is uniform distributed? The data should be uniformly distributed across this. Is it the data? What is it that is uniform distributed? I have a bunch of sets which are fixed by the way. Number of sets is equal to you can easily calculate. It is c over e times b that is fixed. So, what is uniform distributed? What is the rate of index function? That rate. So, whatever the output we get from the index function should be uniformly distributed. So, my domain of the index function should be uniformly distributed on the range. That is what we really want. The addresses should be uniformly distributed across the sets. So, that each set is equally balanced. That is what the index function buys in. Balance on the sets so that whatever conflicts that will happen will be uniform across all sets. And replacement policy as you can guess of course, we will have huge implication on the performance of the cache. So, usually the way the replacement policy is a design or replacement algorithm is a design. There are three distinct pieces of replacement algorithm. So, R is usually decomposed as there is an insertion algorithm. So, I will explain what that is. There is an age update algorithm and there is a victim selection algorithm. So, what is the insertion algorithm that decides when you have a new block coming into a set. What age should it get? That is the insertion algorithm. So, that means what would be its rank related to the currently residing elements in the set. So, for example, in LRU replacement policy what you do is you give it the highest priority. So, the it becomes an MRU block that is the insertion algorithm. The second component is a update algorithm that says whenever you have an access to an element into this set how are the ages of the elements of the set the update. For example, in LRU policy the element that gets accessed will become the highest priority element that is the MRU element. And the others rank will remain essentially unchanged. So, the MRU element will now be denoted by one one position. And the last one is victim selection that is based on these ages currently which one should have it. So, it is possible to architect each one of these separately. You can do that. You can keep your victim selection algorithm fixed you can keep your insertion algorithm fixed and try to tweak the ages on access as possible. You can do other things also you can do other combinations. So, I really do not have time to go into details of this. There is a 2, 3 decades of adherence algorithm designing replacement algorithms. If you want I can use some of the recent references of this. So, these are pretty much the first order research problems of caches that is architecting these three. Now, of course I mentioned here T to be constant because this course we did not really discuss about much discuss about this topology much. But that is also a very important research problem that is how do you really lay down the bands properly so that as you have already mentioned that access frequency gets affected how you lay down these things. So, one algorithm that people have explored is still any CT are fixed to mitigate this problem that is this processor was to access a piece of data here it will have to traverse the long distance to get the data. So, what these algorithms do is they actually try to indirectly do it the index function. So, at one time what they do is they monitor the accesses to these bands from the processor and they calculate an affinity matrix that is which processor is trying to access the data of which data most and they try to move this data dynamically closer to the processor. So, essentially in effect it is changing this index function it is saying that how should this data be indexed into a cache. So, instead of indexing this data into this band you should index it into this band cache. So, it implicitly changes the index function that is one way of mitigating this problem of uniform non-uniform access latency. So, that is about your cache performance the other one other one important problem cache is energy optimization that is and usually we talk about cache energy especially for large caches. So, here essentially the problem is that as the transistors are getting smaller they are getting wiki wiki in the sense that even if you turn off they put voltage to transistor it does not turn off there is still a slick path from the high supply voltage to the ground. That is called leakage current. So, do not get confused do not confuse this with the leaking of charge in the data. It is not like the cells are leaking in the cache it is not like that it is just that there is a path from your high voltage to the low voltage that is always on so that is the leakage current and this is a big problem when you have large SLM address. So, a cache a cache that becomes so large after they have the biggest consumer of leakage energy in the chip. So, how do you really optimize leakage energy? So, some of the common techniques that people have tried is you can if your architecture techniques there are many circuit techniques I will not go into that. So, you have let us say an AOS can also be kept you are running an application. So, you start monitoring the heat rate of the application that is enjoying from the cache. Suppose you shut down one way So, you have a cache with A minus 1, say that is it instead of A. That is the heat rate of the application change if the answer is no then you have essentially saved 1 over A portion of leakage energy by shutting down one way without using performance. You can keep doing this without shutting down the ways of the cache. Of course, you may have to do the other way also. You will find that you are losing in performance then you have to turn on the devices. The problem is that turning on actually takes time. It is not a it is not a zero time event. So, you have to be very careful when shutting down the device because you have to remember that you cannot bring up the performance as fast as you can. The second approach that people have tried is that although you have an 8 megabyte cache it may be that you do not need a full cache at any point in time. You may need only a small portion of that. But the problem is that the way the data is mapped on your banks is all scattered. So, some data is here, some data is here, some data is here. So, some of the data so our subset of data is always on at least one bank. So, you have to keep all the banks on all the time. So, what you can do is that you have the same technique that we are talking about. You can figure out the activity and you can figure out what data access is together. So, you can cluster them all in small number of banks and shut down all the remaining banks. That will save also a lot of money. So, both of these techniques require online monitoring and often people employ machine learning techniques to figure out what needs to be done. That will have the optimal effect in future. So, that is about caches. So, I will just talk a little bit about the DRAM. We have discussed these things. So, in DRAM there are so in DRAM the access latency is determined by your row hits. So, that is the major research problem. That is how do you maximize the number of row hits. So, we have discussed one possibility in the class last time that the memory controller could actually cluster requests that go to the same row. So, that will make sure that you need to maximize the number of row hits. But the problem is that this may lead to lack of variance. What may happen is that you may end up giving the DRAM to one particular thread of your computation which essentially happens to have a sequential access pattern. So, you have a lot of row hits. You all cluster the rows of that the memory controller cluster the rows of that cluster the accesses of that thread that could be the same row and will keep on sharing it. Other threads will keep on starving simply because they don't have enough row locality. So, what you have to do is that at some point you have to break this. So, there is a tradeoff between fairness and your throughput. This one tries to maximize your throughput. What may hurt fairness quality. So, you come up with a fairness matrix here and try to figure out how to promote that. So, essentially what I am saying is that you just cannot keep on sharing requests for a particular thread just because it has it maximizes the number of row hits. You have to get it to other threads also. And the other problem is that so that is the that is the perspective of memory controller. The second way of attacking this problem is that you could lay out your data in such a way that applications can access in a particular locality of the program the data will be sequential and they will be accessed with maximum load. So, here you require some from the compiler for allocating the data. Compiler access state data layout or you could try out a dynamic technique just like this. You could figure out that currently the application is trying to access data from four different rows. You could dynamically change their addresses remap them to the operating system and put them in a single row that you will do it. So, I will put that also here who has assisted dynamically mapping. So, roughly for the next few of the lectures what we will do is we will look at some of the commercial processors. We will not look at the instruction set of it. We will look at the microparticle that is how they have to implement the instruction set in a processor. So, the first one is from MIPS. It goes back to mid 90s. This was one of the first dynamic or the boundary issues for this processor. So, here is some information about this processor. It is 6.8 million transistors. So, you should compare this number today's high end processors which are routinely 1 billion, 1.5 billion transistors. So, the 2298 millimeters squared die, that is the size of the chip on 0.35 microns in most. So, here today what is this today? The size of the smallest 22 nanometer. So, that is in production that the leading chip manufacturers have already demonstrated much smaller transistors. So, out of 6.8 million transistors 4.4 million are devoted to LON instruction and media. So, you will observe this train more today actually. So, percentage wise how much is this roughly? 2.3. So, today actually it is even more. It is about 90 percent transistors devoted to caches. 10 percent goes to logic. So, that is the today's graph division between memory and logic. It fetches, decores and denames 4 instructions every cycle. It has 64 bit registers. So, the data path is 64 bit wide. There is a 1 chip 32 kilobyte LON instruction and data caches. 2 ways it also gives. Of chip L2 cache of variable size you can also get a good time. By 12 kilobyte it is 16 megabyte. 2 ways it also gives. If you have forgotten this one you can look up your past lecture notes. We will also refresh your mind we will discuss this. Line size 64 also gives 128 bytes. Again you can configure it. So, that is summary of the graph sir. So, what we will do is we will do each of the pipeline steels. So, in the fetch stage instructions are slightly pre decoded when the cache time is brought into instruction cache. So, essentially what happens is that before you fill in the instruction cache each instruction is appended with 4 extra function unit bits. So, essentially it says that to which function unit you go. So, that simplifies your decode because the decode you have to do this actually. Instruct that before you put the instruction cache you do this. It is a very simple thing to do. So, that the subsequent access can reuse this particular instruction. You are going to decode it every time. Processor fetches 4 sequential instructions every cycle from instruction cache. So, remember that you have to do this to be able to feed your pipeline properly. Instruction TLB has 8 entries fully associated back by a larger unified TLB. So, there are two levels of TLB. The level one instruction TLB is small, but the second level TLB is not large. There is no branch started buffer. So, the fetcher really cannot do anything about branches other than fetching sequential. So, fetch sequentially until you have somebody reformation. Fetched instructions are put in an 8 entry instruction buffer for the decoded. So, that is the fetched instruction. Stage 2 decode rename decodes and renames 4 instructions every cycle. The targets of branches, unconditional jumps and subsequent calls are completed in this stage. What this means is that you actually have an ALU in this particular stage to do this because targets of conditional branches will require an ALU because you can add offset to the PC plus 4. For unconditional jumps you need nothing. The decoder will give you the target because it is part of the instruction. Subroutine calls also will come directly from instruction are completed in this particular stage. So, these are all available here. Unconditional jumps are not fed into the pipeline and the fetcher PC is modified directly by the decoder. Unconditional jumps do not make any further progress. Right here the decoder tells the fetcher to fetch from the target program. Conditional branches look up a bimodal predictor to predict the branch direction taken or not taken and accordingly modify the fetch PC. And the return instructions will look up return at the stack. We have discussed all these gadgets what they actually do. In fact in your homework we should be working on a bimodal predictor as well. Let us see how accurate that is. Any questions on this? So, this is the stage where actually fetcher gets the first information about what to do on a branch. Because remember that it does not know BTB so it was fetching sequentially. So, let me tell the branch prediction. These things we already know. The branches are predicted and unconditional jumps are completed in stage 2. So, there is always a one cycle bubble which is more instructions which we cannot about. So, remember that branch delay slot the legacy mix thing continues here. But it is only just one instruction after the branch. The remaining three are not in the branch in itself. So, essentially you can think of it as one cycle bubble costing you three instructions. That is all. Provided of course the branch instruction was the last one in the previous static of all instructions. In case of branch prediction which will be detected later. The processor may need to roll back and restart fetching from the correct target. So, how to do that? We have discussed this house in the class. You need to check point the register map right after the branch is remained which will be needed to restore in case of miss prediction. The processor supports at most four register map checkpoints. This is stored in a structure called a branch stack. It is really a FIFA actually. I do not know why they call it a branch stack. So, which means it can support only up to four in flight branches because every branch will require a check pointed register map and you have only space for four. The fifth branch will have to stop if the previous four branches haven't yet resolved. The prediction is an array of 512 to be saturated counters. So, that is the bimodal prediction. It can count up to three. So, that is the definition of the saturated counter. It can count up to 23 and increment does not have any effect. Similarly, the count is zero and increment does not have any effect. That is the two bits that will be found. The array is indexed by these bits of the PC bit 3 to 11. So, it actually discards the lower three bits 0, 1 and 2. Why is that? We said that we should discard two right the lower two bits because the instructions are 4 by 9. But here they discard 3. Why is that? What is the next instruction of the branch? It is always executed. So, can I think of the branch instruction to be actually used form of these two instructions? Because I know that the next instruction of the branch will always be executed. And the restriction that we exposes that we cannot put a branch in the branch itself. So, that means I can actually assure that the branch instruction is effectively done. So, I can actually remove this lower three bits. So, ignores lower three bits. There is the next nine bits to index the 51290 array. And the outcome is the count at the index of the predictor. If the count is at least two, then predicted it is not taken. So, that is the mind of predictor. Simple algorithm prediction accuracy of 85 plus percent that is what they showed for their benchmarks how this prediction accuracy actually has implication on the pipeline length. Because we say that if you have a prediction accuracy of B and you allow n branches concurrently in flight, then I want this to be at least bigger than 0.5. So, actually if you do this in this case n is 4 because we allow 4 branches. That is what we just said. If you have 4 outstanding branches you can get the 15th stop. So, n is 4 e is 0.85. So, actually you have 0.5. You can calculate that. You are actually above 0.85. 0.85 to the power of 4 is 0.5. The branch predictor is updated when a conditional branch arrives. So, in order update because retirement is in order. At retirement we know the current outcome of the branch. At this point the branch stack increases by 0.5. One branch stack entry contains the entire register map not the register values. The target of the branch and few control bits. So, target has to be saved in the in the in the branch stack because that may be needed to update the predictor to know whether you need to correct it. The decoder assigns a 4-bit branch mass capability instruction. So, in the pipeline there can be at most 4 in flight branches I am saying that each instruction goes out with a 4-bit branch mask. Anybody guess what this is first point? What is this branch mask? An array of 4 inputs. Yeah and 4, sir. What is it holding actually? Which one of those correlations does this not belong to it? No, we are saying that you attach this mask to every instruction not branching only. So, can someone define a branch mask? Yeah, so essentially what you are saying is that if I have an instruction I tell me which branches it is controlled dependent on. So, that is this process. You may have all the bits of the branch, you may have all the bits of the branch, you may have all the bits of the branch, you may have all the bits you may have all the bits set. It may be dependent on all of them. Yes, right. So, it may have all the bits set. So, which means it has gone through 4 branches. So, the fate of instruction I depends on the fate of all these branches, right. So, why is it needed? Why am I attaching this? So, this prediction how is it used to? So, this prediction and we may have to re-execute the instructions that are dependent on the How do I figure it out? So, what I do is whenever I respect a branch I can say I respect the second branch, right. So, I prepare a mask like this 0 1 0 0 and send this mask to the pipeline. So, this mask will be ended with the masks of all the instruction. Only the non-zero outcomes will actually be there, right. Because we know that those are the branch, those are the instructions that control dependent on this particular branch. So, it might be helpful if you are thinking of coming to class to brush up a little bit on rename, what the renaming does. We will come to the class that will help you understand the lectures. So, we will after we will take up a processor from Compact Alba to look at 2 workshops of it. And a little bit of the whole microparticle that is used without going out.