 OK, moji njih je Martin Matuska in da bo to počke, da bo kaj so ta napravljali. Tukaj ga so napravljali o zvomčenju ZFS na 3BSD. Znamo prej, da smo načaljali o to, da so ta napravljali. V nekaj nekaj, da sem ti pa nekaj nekaj nekaj, da so je to načaljali. Vse zelo sem tudi vse generalne tudiče, vse zelo basite. Tudi tudiče našli tudiče za vseh tudiče in vse z zfskajov. To je vsezvečen, da se je tudi vsezvečen in tudi je zelo kakorite. V nekaj različen sem tudiče, vseh pomečnje v zelo. Vseh zelo eročnje je svojo vseh modificovati, da je vseh prišlično način danes in svojo zelo začnega všeč, kaj sliži zelo zefas. In zelo, da se prišlo, ki druga všeč vseh, je vstavlja v svažetnih, tako zelo vseh, kaj bi vseh vseh vseh, event, ZFS. At least, if you have three letters and the last two ones are FS, it's probably the last word, yes. So it is a modern 128-bit file system, it is open source, originally developed by Sun microsystems, and it is utilizing the copy-on-write model. This presentation will tell us answers to two questions. First of this question is, how can we tune ZFS, and the second question is, when should ZFS be tuned at all, so if we need that, or if it's not necessary. So I'll start with this help, my ZFS is slow, okay, ZFS is slow. First of all, define slow, I mean, what does it mean slow? First you have to tell, what does it mean to be slow? What are you comparing to? Are you comparing to the very same system yesterday, as it is running for 24 hours? Yesterday it was quick today, you have performance problems. Or you are comparing to a completely different file system on the same machine, or you are comparing to a file system on a completely different machine with completely different hardware. If you think that, then it's scientifically irrelevant, because that can be based on these assumptions. Well, as I say, it depends on many factors, workload, data access, what data structures do you have on your system, do you have many files, do you have few files, are your files big, are your files small. In ZFS, we always have this magic trade of data consistency and features against speed. So we have stuff like ZFS checksums and other features that cost processing time and make your system slower than if you are using a file system that doesn't use these features. Another point is, it might happen that auto tuning may not be your case. So if you have special, not standard environments where you are, for example, having a heavy utilized web server with 30 million files, what was my case, then the standard settings are not optimal for your installation. Okay, the first thing I want to say is always think twice about what you do and what you said. I have here quotes from two blocks on the internet. First is a block by some guy who writes about speeding up ZFS on FreeBSD, and among other things he has here disabled unwanted features. So I'm really quoting him. This is what's on the block, and he says, you know, if you don't need checksum, disable them, and on his block also, if you look at his optimization strings, there is, I don't need checksums, therefore, therefore, so ZFS disabled checksum on the whole system. And he's using a mirror or RAIDZ data set. Yeah, wonderful. Okay, another block that's from NexCenter, also for tuning on a NexCenter system and that says, a note on disabling ZFS checksum don't. So that there are two different opinions. If you want to hear what's my opinion about this, yes. If I disable checksums on ZFS, then it looks probably like this. And yeah, you know, that's like shuffling my own grave. This checksum is one of the very important features that helps you keep your data integrity in ZFS. So each block of data is being checksummed, and these checksums are compared always every time you access this data. So if something changes on your disk without the knowledge of the system, the system is aware of this. Another point, if you have RAIDZ or mirror installations, you have the self-healing feature that means that if the data gets corrupted on one disk, the system automatically replaces the corrupted data with the data that matches the checksum from the other device. We'll start with a section of general tuning tips. These are really very basic ones, so nothing advanced, very easy to perform. It will be about system memory, access time, data set compression, some stuff about data duplication and ZFS send and receive. In one of these blocks I have mentioned, this guy who recommends disabling checksums, he also recommends don't use RAIDZ, use just mirrors, because it's faster. It depends what you want. If you want a lot of space, you need a RAIDZ, because with a mirror you always have the half of the space of your setup. Even in ZFS, I have a four-way mirror. That means I have just four disks running as one disk, just of the purpose of speed, so that the access times get lower of all these four drives, because SATA drives are maybe big today, but they are still not fast compared to, again, AS SSD drives, for example. Even the SAS drives have at highs, I guess, 15,000 rpm, so you still, it's faster than SATA drives, but it is not as fast as the bleeding edge SSDs. So let's start with random access memory. ZFS is RAM eater, so the caches and other parts of the file system need random access memory, so in FreeBSD we have a recommended minimum of one gigabyte, but that's really for some kind of a really, really small home server that's doing nothing, that is usable if you need serious work, you need at least four gigabyte, and I recommend using eight and more as a total memory, and from this memory we'll look later how much of this memory is used for the ZFS caches. So access time, that's a topic about Unix systems I have heard for a long, long time. The problem of access time is that every time a file is accessed on a Unix file system with enabled access time is that the data is stored, that the timestamp is updated. I mean, on normal systems, non-ZFS systems, let's call them like this, it might not be such an issue, but on ZFS let's say you have snapshots, and you have lots of snapshots. I mean that now every time you access it, this data has to be updated compared to the snapshots. So many people ask themselves why is my space growing, I did not write one single file in my system, but I have 20 million files, which are accessed daily, and I'm doing daily snapshots, and my snapshots have like hundreds of megabytes, why is that? And that's just, sorry, and what is this? Okay, so that's just because of this access time feature. The access time gets updated, and the snapshots remember all changes in data and metadata. So the access time change is stored in the snapshot, and in the next snapshot, again new access time of the latest one before the snapshot was done and so on. So your snapshots take space, and the space usually may really grow if you have lots of lots of small files, which are accessed frequently. So dataset compression, well that's a mixed story. The original target of dataset compression, the goal is to save space. That means that you compress your data, and they occupy less space on your drive than the byte number that's written in the data, so the size of the data. It depends how compressible this data is, and it depends on what, which algorithm do you choose. Currently only two algorithms are available. It's LZ, YODB, and LZJB, and GZIP. LZJB compresses less, GZIP compresses more, but with the CPU usage, LZJB uses much less CPU than GZIP, as GZIP much, much more. Of course, if you have a very slow device and you are reading compressed data, your relative data throughput increases. I recommend using compression primarily for archiving purposes, so, for example, disk with locks, so you just don't compress your locks with BZIP, you just disable in 3BSD the log compression, have a separate dataset, sets GZIP, ZFS compression on this dataset, and the data is on the fly, is compressed on the fly as the locks are written. There is a new compression algorithm in the works at Ilumos, that is called LZ4, and it should be twice as fast as LZJB, and it should consume even less CPU power, much less than LZJB, so I'm interesting how this is going to work out. So, data application, okay, if you think ZFS needs a lot of RAM, then if you are using data application, you need, like, to the power of 2, or something like that. Well, it depends on how many blocks are you using, because the data application uses a kind of hash table, where all the duplicate block data is stored, and you have good performance when this hash table feeds into your memory. If your memory is larger, it has to be split, and part of the calculations have to be outsourced, and it gets really, really slow. There is a magical command to the ZDB, called ZDB minus S, and it does kind of a simulation, how much space would you gain if your system would be deduplicated. So, it's just kind of a dry run, where the data application isn't really enabled, the data isn't stored once, but you get a print of a table, and also the size of the table, how much space you will gain, and how big this table will be. If you are already having deduplication, then you can check the data with ZDB minus D, or minus DD, that gives you detailed information about how the deduplication structure works, and generally it's recommended that it should be at least a factor of two. If you don't have a factor of two, then it's not really useful. Factor of two means that you save half of the amount of the data that's stored on the drive. OK, ZFS send and receive, as you might know, ZFS is able to send data sets in a stream, and you can store them in a file, or directly receive these systems on another system. This method of direct receivable on another system is used very commonly. Many people use this, for example, piping through SSH, or other secured connections. And the problem is the buffering that's built in ZFS is not optimal for this kind of data transfer. So, I recommend using some kind of mid-buffering solution. We have in our ports, one of these is called MISC buffer, and the better one is M-buffer, which is network capable, so you can set up M-buffer on one side, on another side, and just pipe through this M-buffer data. You define the amount of memory that's allocated to M-buffer, and the speed gains might be really big, because the data is read from the drives, from the devices, is not constant. So, you have parts where there are large files, you are reading quickly, you have parts where there are many smaller files, you need more seek time, and it takes more time, and sometimes the reading is faster than your network connections, sometimes the reading is slower. So, this buffer actually makes such a wage between these two situations, and you get a more constant speed, and altogether it's faster, because the system has less waiting time for the connection or for the data. So, application tuning tips, we are going to look how to optimize for the following application, I will show some settings for the web servers, some settings for database servers, and some settings for file servers. So, before you go to the hardcore optimization, that means when nothing else helps, that's the possibility how this could be done. Maybe in the past times many machines have been optimized this way, but today it works a little differently. So, let's take a look at web servers. As of the current ZFS implementation, we have a problem with this, because if we are using the send file call, the data is actually cached twice. First in the free BSD inactive memory, and second in the ZFS cache. So, you have twice the data in your cache, and that is very bad, because you are losing your RAM memory in a quick way. These four, as of the current implementation, is recommended to just disable send file, because it's directly served from the ZFS cache. So, you don't have to use the free BSD standard caching system. In this case, this counts also for the M-MAP in APASH, so if you apply these two settings, you will notice on your system that you have much more free RAM memory, and the speed will be about the same. In engines, you just disable send file with send file off, and in light Httpd, you set write V as the network backend. I have personal experience with this, and I can tell you that it makes a difference. I have administered systems that have 48 gigabytes of RAM, or even 96 gigabytes of RAM, and when I have changed these settings, I look, oh, I have 30 gigabytes more free RAM. Ah, nice. OK. Database servers. Well, many people say don't run database servers from ZFS. It's generally always slower as the other file systems. It might be true, at least for UFS. For example, PostgreSQL is a lot faster from UFS than from ZFS, at least on free BSD. What is recommended is to change the default record size, because databases store the data in chunks, consistent chunks. And these chunks in PostgreSQL are by standard 8 kilobytes. So if you set the record size to 8 kilobytes, you have a more effective data set, than if you leave the default 128 kilobytes. The 128 kilobytes is a variable size, but the allocation is better if you put it down to 8 kilobytes. Doing nothing, this is just OK. In MySQL, for my eSum storage, it's 8 kilobytes. For InnoDB storage, it's 16 kilobytes. I personally use the 8 kilobytes for my SQL installations as a universal setting. Well, file servers, I have some general tips. First of them is disable access time if you don't need it. Keep the number of snapshots low. If you have quite a lot of snapshots, then it gets again slower because of the way the data is stored, because the data in file server is stored relatively in the snapshots. That means each snapshot remembers changes from the last snapshot. And as of the experience, we have, if you have really a lot of snapshots, many internal ZFS operations take longer because they have to traverse all the snapshots. So these internal operations slow down other data activity on your system. If you want to use the duplication only if you are sure that you have enough random access memory to do this, otherwise you have, again, to slow down on your system. For heavy write workloads, like business scale, it's recommended to move the ZFS intent locked to separate SSD drives. Normally, the ZFS intent lock is stored on a small part of your existing pool, of your existing devices. And it's something similar to the journal. It's not the same, but the functionality is very similar to the journal in journal in file systems. And if you put it on another drive or on fast SSD drives, then your write load gets faster. Optionally, you can disable the ZFS intent lock for individual data sets. So it's configurable, but, again, beware the consequences. You are losing the data, the integrity on a possible system shutdown or panic. So we are now going to the more scientific part of my talk. And that's tuning of cache and prefetch settings. So I will going to talk about the adaptive replacement cache. Arts, that's the key point of ZFS, the key cache. Then we have the level 2 ARC adaptive replacement cache. It's not, it's, most systems are not using this feature. But, again, if you are deploying larger systems, for example, with heavy red workloads, it's very useful because it increases your, it's similar to increase in RAM, but you just have more of this memory on your SSD drives. Then we have the ZFS intent lock. That's also measured and monitored. We have a feature that's called file level prefetch or Z fetch in ZFS. Then we have a device level prefetching, VDF prefetch, and then I will show you the statistics tools I'm using. So the adaptive replacement cache, it resizes in the system write memory. It's the major speed up of ZFS. So if you disable ARC, you will see everything gets really slow in ZFS. The size of the ARC is auto tuned. Let's take a look at the default values. So if you look at them, we have here the maximum size of your ARC is actually physical RAM less one gigabyte. So if you have like 16 gigabytes of RAM, then 15 is auto tuned as a maximum value for ARC. As we can see here, or half of all memory, so if you have only one gigabyte of memory, you cannot allocate one minus one gigabyte for ARC because you have no memory for anything else. So only half of the memory is allocated to ARC. Again, we have a minimum, and the minimum value is 64 megabytes. So here we have something called metadata limit. Your ARC memory is divided into two parts. One is ARC for data, another is for metadata. Metadata is data about data, so there are stored information about the files, inodes and stuff like this. And there are situations where you hit limit of this memory. So you take a look, you see your other values look good. I mean, you have enough RAM, there is still free ARC, but your system is slow because this part of the ARC memory gets filled up, and it has to be replaced. So the data, there are still a lot of reads on your other drives, and you still have a lot of free ARC memory in total. So you're asking yourself, how does this happen? Then we have a minimum, and the standard for ARC is one-fourth of the maximum, so if we have, again, it's one-fourth of these 15 gigabytes. Then from this metadata limit, we cut it to a half, and we have the minimum for the ARC. ARC, when your system is running, this ARC is auto-tuned. It cooperates with the VM memory pressure in FreeBSD, so if ARC needs more memory, it allocates more memory, but all of this is kernel memory, of course. And if other parts of the system need this memory, then it's, again, freed, but only up to the minimum. So it actually moves between the minimum and the maximum, and on a busy system, it will be always close to the maximum, or some kind of equilibrium chosen between the VM pressure and the memory ARC is ready to release. Yes? I will show you this in the statistics tools. So, first of all, how can we tune this ARC? You can, ARC can be disabled on a dataset level. That means you can tell, you can say, this dataset do not use ARC, and you can say this dataset use only ARC for metadata. There are two options you can set. Again, it might be useful if you are short of memory, or if you really explicitly need to reserve your ARC for only a specific group of datasets. The maximum can be limited to reserve memory for other tasks. Many people do this, and it might be useful on several types of systems. For example, if you are using TMPFS, and you want this memory allocated to TMPFS to be always there, you don't want ZFS to grow, it's ARC memory to be, you can just cut the maximum and save this memory for other purposes. As I said again, increasing this meta limit might be useful if you have a lot of metadata that means, for example, lots of files. If you have really 30 million files, in our case, increasing this value made a big change on how the system performed. Okay, L2 ARCs, that's the level 2 ARC. Some facts about it, it's designed to run on fast block devices, SSD. It helps primarily on read-intensive workloads. So if you are deploying a Samba server in a company where you have 50 to 50 read-write workload, L2 ARC is not the thing for you. But if you are deploying a web server that works like write once, read many, then SSD is very practical, and we have done this with about 300 gigabyte of SSD storage, and this was used very efficiently, so we have been very happy with it. L2 ARC, the system's ARC is the same ARC for all pools on your system. So all pools on your system are sharing the same memory region. For L2 ARC, each of these SSD devices or device groups, you can also make a mirror, is devoted to one special pool. That means you can have several devices for several pools and you are just limiting this to these pools. And like ARC, on this pool, you can make paired data set settings that only this data set is using the memory and other data sets are not using the secondary level cache. Okay, how to tune this memory? First of all, by default, the prefetch for L2 ARC was disabled, and for us, this was very hurting us. So we had to change it to on again. The setting is VFS, DFS, L2 ARC, no prefetch. It's a loader conf setting. And if you are doing streaming servers, like streaming large files or video files, then I recommend turning this on, because it's a huge performance gain. And the L2 ARC doesn't work really well if it's not set. There is a period called turbo warm-up phase. What is this? Because SSD drives cannot be rewritten forever. They are getting lost. The engineers of Sun had thought to just lower the speed at which the L2 ARC is written on the drives. And there are two phases. There is the standard writing phase, that's the write max, and there is a turbo phase, it's called write boost. By standard, they have set these settings to eight megabytes. That means in the standard situation, only eight megabytes per second can be written to the L2 ARC. And this may be a bottleneck on your system. And in this turbo phase, they have set in again to eight megabytes. So in the beginning of the system, this warm-up phase actually happens to the first moment, where memory is evicted from L2 ARC. That means until memory has not been evicted, it's the turbo phase. And when the first memory gets evicted, we have the standard situation. So in this turbo phase, these two values, again, loader settings are added. So this is eight megabyte by standard, this is eight megabytes, we have a write of 16 megabyte per second, which might be for today's SSDs that slow. So I recommend setting this to higher values, at least 16, 16, or 32, 32, or even faster. But again, the idea of this was to prevent these SSDs to drain quickly. But the technology of SSDs is improving every year. So as of today, these eight megabyte settings correspond to the situation of four years ago. So the SSDs have developed since then, so I recommend setting higher values. So ZFS IntenLock, it guarantees the data consistency on f-sync calls, replace transactions in case of a panic or a power failure of the system, and in general uses the small storage space on the pool, as I already told. And to speed up writes, it can be placed on a separate device. As you can see here on this line, there is a per data set setting. You can set sync to a standard, always, or disabled. That's the per data set synchronicity. If you set to disabled, then it's not used at all in the IntenLock, that means you have no data integrity. And I personally don't use this, but some users recommend that it might be some situation for some parts of the drives where this might be useful. Setting this to always means that on every f-sync, data gets not written to the log, but to the disk, always. So it gets to the log and to the disk, and that's very slow. So this is really for hardcore data consistency applications, where you really need it to be written, not just in the log in the system, but exactly already in place when the f-sync call returns. The standard setting is sufficient for most people. I'm using only the standard setting on my systems. So file-level prefetching, file-level prefetch analyzes read patterns of files. It tries to predict next reads, and the goal is to reduce application response times. There is a load tunable to enable and disable Z fetch. Many people recommend disabling it. I will show you later in my statistics tools, you can measure how efficient is Z fetch on your systems, and according to this data, you can make the decision to disable it, or not to disable it. We have device-level prefetch. Device-level prefetch is used to when you read small chunks from a device. The intention of this is actually for slow devices that have a bad access time. That means if you read just a small chunk, device-level prefetch actually reads from this device more data specified in this variable. So this was usually set to 10 megabytes. It's a static-allocated kernel memory at boot, and when you set this memory, then reading like four kilobytes, there is also a threshold that can be also set, but that's a byte shift value. And when the reads are under this threshold, then like 10 megabytes are read from the data instead of just like four kilobytes. The idea is that there are less seeks on the drive, and you have more data available on this one read. The problem was people have been using ZFS for appliances, and these appliances have had a lot of drives. And if you have like 32, 64, or even more drives in a big appliance, you multiply this with 10 megabytes, you are just losing this memory already at the beginning, because it's statically allocated that it cannot be used for anything else. And that might be quite a large amount for a big number of drives. On a small system, there is no real reason to disable it, but it has been disabled by default. So again, what you can do, you can try to enable it and use the statistical measurement tools to take a look how effective, how efficient is it on your system. So the statistical data in ZFS is provided by CCTL Knops. It's called, it's actually by Pavel Jakub Davidek, the KSTET framework, he has imported it. And this KSTET framework gives you a lot of counters, lot of statistical data, a lot of values. This VFS, ZFS are statical values and these are collected values. So this is the current state, how much is what, and here are collectors. This data can help you make tuning decisions. And this is quite important because just tuning, because somebody else says, yes, this might be good, that's no scientific way to do things. And it's much better if we make some measurements. I will show you two tools. First of them is called ZFS stats and the second one is called ZFS mon. Both tools are available in port under CZUTILS ZFS stats. So ZFS stats is based on Ben Rookwood's ARK Summary PL for Solaris and includes modifications by Jason, Jelenthal and myself. It gives you an overview about how your system looks like now and what happened since the system boot. It also has several common line flags. If you use the minus H flag, you have a help that shows you what is possible or if you run it without arguments and it shows you the structure of your memories and how filled they are. Here is a sample output of this. So here we have the ARK size. Here we can see it's 25 gigabytes. It's the target size. We have the minimum size, it's four gigabytes. And the maximum size is 32 gigabytes in this case. On this system, the ARK memory was limited. This system had a total of 48 gigabytes RAM and we have limited the ARK to 32 gigabytes of RAM. As we can see, the ARK is used up to 80%. And then we have efficiency since boot time. So it's a collected value. And there we can see it's 90%. And we have a 10% miss ratio for ARK. There are also demand and prefetch efficiencies. Then we have the L2 ARK breakdown. And again, we have a heat ratio of 62.89%. A miss ratio for 37.13%. If you want to collect, if you want to calculate your total efficiency, you have to take this value, that's the miss ratio of the ARK. And this value is actually split here in L2 ARK. So from this 10%, 62% are read from the L2, from the level 2 memory. And this 37% have to be read from the drives. So it's probably about 6%. So your total efficiency is about 96% in this system. This is an example output of ZFS stats. It's very useful to give you a statistic overview of your current system, but people are more interested on what is going on now. That means not what is, since the system was booted, because for example, the first week, there was no utilization. And now in the last week, we have a lot of data access and drives. So I have a second utility that's called ZFSmon that pulls ZFS counters in real time and analyzes this data and gives you real time, absolute and relative values. Again, there are lots of flags. And my inspiration for this tool was Varnish stat from the Varnish project. Most of you maybe know this or many of you. So the output is exactly like Varnish stats output. It looks like this, but this is the extended version. I mean, you can just have a very, very low output that shows you just the efficiencies. There are a lot of flags, you can change this. You can even tell the system that collect the data for 120 seconds and then just output the statistics. So here we can see like in Varnish stat that means this is statistics for the last second. That means what did the system do in the last second? Then we have an average for the last 10 seconds, for the 60 seconds, and a total since the command was running. We have this for absolute values, that's here, that's really how many hits have been counted. And here we have relative values, that means what is the efficiency per 10 seconds compared, for example, hits against misses and so on. So here we have in the 10 seconds it was 91%, in 60 seconds it was less, and as a total it was about 90% in two minutes running time, 120 seconds. Yes? You actually do tuning after you run this. You run this before running tuning, yes. What? This example is from the same system as before, that means you have an arc limitation, the L2 arc prefetch is enabled. For example, from this setup we might ask yourself the question, is Zeefetch, that means file view prefetch useful for me? So we take a look, at least in the last two minutes we have here an efficiency of 92.92%, that's quite nice. I mean I would use Zeefetch with this efficiency. For L2 arc it is quite a lot lower, we have here 71%. I usually never manage very high efficiencies with L2 arc, but on my systems I have been happy if it's over 70% in the long run. For L2 arc it's okay, for arc it's not okay. For arc you need a higher efficiency like here 90%. Okay, so this is about the statistic tools and now I am open to questions. So I let this open and if anyone has questions I would like to answer them. Yes. More of a comment, back when Son was developing ZFS they found bad hardware, bad memory, bad disk controller firmware all with the checksums in VMS. So disabling the checksum in VMS you should really question the veracity of anybody who makes that advice. I mean it's a big part, that checksum is a big part of what ZFS is. That's why I showed this example as a bad example. That means he puts these on the blog, there are lots of comments from a lot of people. But the idea of this was just don't trust anybody in any blog you find on the internet because many people look at the internet that was written, that's holy. So that's not the point it should look like. Yeah. Other questions? Yes? On your servers? Did you... All the servers are full ZFS servers. That means they boot from ZFS, they run these data from ZFS and so on. So there's really only ZFS filesystem. Which of the tunables are... Is it possible to tune on the running system and which require a reboot? There is a very low number of tunables that is tunable on the running system. One of the tunables I didn't mention that can be changed is the threshold for the TXG writing time that can be used on write workloads. The auto tuning of this available is quite good. There are some cases where you want to set this fixed value. That means that after the TXG accumulates a specific amount of memory, it is written to the drive. In the past, this was somehow badly implemented, so we have been heavily using this tuning setting because maybe many of you in the past experienced ZFS like a pulsing write. That means your system is running, now it is writing for like five seconds. Then again nothing happens and then again it is writing for five seconds. This was happening in ZFS and this tunable was able to change this behavior. But in the recent versions, this has been fixed in the internals so it doesn't really happen anymore, at least on FreeBSD. But there are very few and mostly from SISTITEL, you can actually look in the header files, in the source files, you can take a look exactly which ones are configurable. It's a very, very low number. Most of them are hardcoded memory sizes and you can make changes to this kernel memory on the running system because it's not safe. Other questions? I have a system that has a lot of very small files and I noticed that the meta limit, I exceeded, like the meta limit, I have a 16 gig arc, so the meta limit is four gigabytes, but currently I'm using 13 gigabytes of meta cache. Is there something I can do to lead more arc or? You want more meta data cache or less? I have more, I guess, I don't know. Yes, you have here the meta data usage, the demand and the prefetch meta data and here, as we can see, it's poor. The efficiency values are really poor, so from this system I would tell it is full and I need to increase the limit. But again, you have to reboot the system to make this. Right, but in my system, you have already changed it. No, I didn't change the limit, but the amount it's using is three times the limit, the limit's not actually happening. Then that this might be a bug in the ZFS implementation. Might be. I'm not experiencing this, but I have read in the forums that some people do experience this, so how much is your total memory? Yes. 48 gigabytes. 48 gigabytes and the number of files you are serving. 12 million? There's even less than we have been doing, but we have never experienced this. So it was never higher than the limit number. But I know Andre Gapon was also looking at this AVG in FreeBSD and he found there some settings that have been bad, so it was really possible to allocate more memory than this limit because some calculations had, I guess, off by one error or something like that in our case. You have mentioned that some defaults are quite conservative, like the writing speed to SSDs. Yes. Do you plan to change them in FreeBSD? Do you plan to shift these defaults to more modern values? To change these defaults. Our general policy, at least of now, is to follow the vendor, because the more changes we make, the more different we are. The more problems we have porting new changes from Ilvov. So currently we didn't make any decision to change these values. So it's left up to the user, but we should probably document this somewhere, at least on the V-keys, that this is also important for the L2 cache. Again, this affects only the speed, so how long does it take for you to make this cache full. For example, you can say I want it to be quickly full, but then I want to save the drives, so you could just increase this value for the turbo, and when the cache is full, then the next writes are really on a slow basis, as a maximum. The problem is that most users will not get to that details, and they will just say that the test is not fast enough. Yes, well, it depends. If you are making a setup for a huge company, serving many websites, many clients, where lots of, like, hundreds, thousands, or millions of dollars are flowing, I would expect from a contractor or somebody who sets up my service that they also look at details like this. I mean, I would expect this from my contractors or from my very own department, because if it's not an important machine that simply don't care, I mean, that makes no sense because you just have to invest money and you never get it back. But on these heavy utilized systems, every percent of speed you get has a very positive effect on your revenues. So it's just a business view, at least, of me. Other questions, yes? Drives or otherwise? Any hardware recommendations based on your experience? We have been usually using, really, only SATA and SAS arrays. In some of our setups, we have been even avoiding the RAID-Z combination. So we have really had RAID-5 array, or RAID-6, and on this array we have set up ZFS just because of the caching features. But many appliances are sold with this setup. The question is, if your system is an appliance and at the very same time, web server and other services, if this mix is efficient for you or not. But from the hardware, if you have a big bunch of, so a JBOD of a lot of SATA drives, that's fully sufficient. The problem in FreeBSD, at least at current, is that we are missing this fault, the man. Xen Li has been working on this, it's called ZFSD. The idea is that there is some kind of a message framework that if a drive fails, you get noticed about this in FreeBSD, that the ZFS communicates somehow, that there is some demand that really monitors this. In Solaris you have the SMF, the Solaris Management framework that really informs you about every event that happens. In FreeBSD, we don't have this for ZFS, but I guess other parts of the system might also miss this feature. It could be somehow standardized. Other questions? What do you want to benchmark? Basic, thank you. Just speed, megabytes per second, let's say, for different power sizes. You can use the very common tools, like Bony benchmark or other benchmarking tools that are available. It's the very same benchmarking like others. I don't know if special benchmarking don't riddle for ZFS that tweaks and makes better data for ZFS, that doesn't make sense. So I'm talking here mostly about tuning tools and collecting statistics and what to set and where. Thank you for the question. OK, so, I guess that's it. Thank you for attention.