 Yn yω, Andy Price ac rydw i, yn y team gyffes II redhat. Mae gennym nhw'n tyfydd o'r llangwydd. Fally dyna ni'n gweithio ei wneud i'r llangwyddau cyffes II ystafryda i'r Huw Tills. Rydw i dleidio'r llangwyd bobl iddyn nhw cwmhwych gyffes II. Roedd gennych y cyfan y dynnu'r cyffredinol ar Y LibGFS 2. Mae'n gwneud yn y gweithio'r corffing ac mae'n mynd i ddim yn ôl oddi'i ddweud at y b關hyd ymlaen, ond roedd wnaeth bod jyfodol ei gweithio ac yn hyrwbod hynny'n chyfodol fynd i'r ymditions pa'ch arnyn ni'r gyffredinol ar rai cyffredinol ar y cyffredinol. Roedd wedi ei ddwy lei yw'r perffredinol yma y lluniau sy'n cyffredinol oherwydd mae'n mynd i'ch gwirionedd o'r neud yma yn ystod i gael. Felly dyma'n mynd i'ch gael y language i'ch gael os ymlaen, fyddwn ni'n gofio'r newidau o'r newidau. Felly mae'r wneud i'r gwirionedd i GFSTU o'r cyd-degi i gael, mae'r ffordd i GFSTU o'r cyd-degi i awch y Llywodraeth a'r gweithio'r gweithio'r GFSTU o'r cyd-degi, a'r cyd-degi i gael. Rwy'n credu i gael ei gael. Mae bwysig yn fawr. Rwy'n credu bod yn y sphysigol, dychydigodd yn y ddlannu dyn nhw i'n fagigio. Roedden nhw'n gweld sy'n gweld ar gyfer'i cysylltu cyfnodol,fa rhywbeth fawr iawn. Rydyn nhw'n gweld yn fagigio ei gwaith gyda'u sydd mewn sydd cyflwyno ar gyfer mae'r system ddweud yn nesaf amsigol buserol, ac mae'n rwy'n gilydd eich bod yma yn gweithiol yr ysgol, pray yw fengyrchu yma wedi myllion rydyn nhw. Edych chi'n gweld, rwyf i'n gweld. Felly, efallai'n cyfwyr yw'r cwmniadau o'r bwysig gyda GFUS II basics. Felly mae'n cyfrifio ffiledig yn bwysig gyda'r system, ac eich cyfrifio ar gyfer fwyd yn ei bod yn gweithio ffiledig i'r type. Felly, ar y cyfrifio, mae'n cyfrifio'r bwysig gyda'r system yn cyfrifio'r bwysig gyda'r gwym ni wedi'i'n ffordd o'r gyfrifio'r bwysig gyda'r system. ac mae'n gyffers 2-SB-type blwc, ac mae'n rheswysgrwp hefyd blwc, sy'n gweithio'r rheswysgrwp yn y system ffile, ac mae'n gweithio'r rheswysgrwp hefyd, mae'n gweithio'r metodatae hefyd, fel y superblwc, ac mae'n gweithio'r bwysig ac mae'n bwysig ar gyfer gwahaniaeth, ar gyfer gwahaniaeth rheswysgrwp hefyd. Rwy'n ffile o'r metodatae i gyffers 2-scythoedd mewn cyntaf yn y ddweud, y metodatae, y brif, rwy'n gweithio gyffers 2-scythoedd, ac mae'n gweithio'r hyn ystod yn y cyffers 2-scythoedd. Mae'n ddweud o'r twyaf yn y ddweud, ac mae'n gweithio'r changes to the file system. So that contains things like journal data, and quoted data, and other pernode stuff. And just like other file systems, GFS2 has user space tools in the GFS2 Utils package, so ffck, and makefs, and things like that. OK, so what are we trying to achieve with this language? Well, file system corruption is a fact of life. I can come back by power outages, and faulty hardware, and sunspots, whatever. And so much of this you can fix with ffck, as long as it has enough context to rebuild the file system metadata and get it back into a consistent state. But ffck can also have bugs, obviously. So you need good test coverage of ffck.gfs2 in order to be confident that it can fix all of the corruption scenarios that you're expected to. It doesn't have to fix every single corruption scenario, so if you write over a superblock and a couple of resource group headers or something like that, then it's very difficult to rebuild the metadata from that. So you need good test coverage in ffck.gfs2. OK, so one problem we have is that when a user encounters some file system corruption, it's very difficult for them to communicate the nature of that file system corruption to us, from human to human, so that we can understand the nature of that corruption ourselves, so that we can try to fix it and understand it and inspect it. So the conditions are difficult to replicate on our side. And so it's tricky to test that ffck actually fixes all of these scenarios. So to date, we've used metadata dumps to gather information about file system corruption from users. A metadata dump is basically a compressed file which contains all of the metadata from a GFST file system. And so it doesn't contain any user data, just the metadata of the file system. And the user can dump that to disk and with GFSTU edit saveMatter and send us the metadata dump. And we can restore it onto our test systems with GFSTU edit restoreMatter and run our tests on it and inspect it, see what the corruption is and why our tools didn't fix it and produce a test from it and things like that. So these metadata dumps can be pretty big, depending on the user's file system. It can go into gigabytes, terabytes. So it takes a long time to download these things from the users. And also we need a lot of storage space to keep a repository of these metadata dumps for future testing. So if you wanted to produce a regression test to check that ffck fixes the corruption and also after making changes to ffck that it still fixes the corruption, we have to keep that metadata around. And also they consist of main and clean blocks. So the metadata might only be corrupted in one particular point. And so when we run an ffck over it, for example, it would have to grind through all of the clean blocks before it gets to the point of corruption which then exercises the code path that we want to test. So that can be slow and it's not very focused. So how do we inject breakage into a clean file system in order to test that we can fix that breakage? Well, one option is by using DD. That's a bit heavy-handed and it's like using a sledgehammer really. When you write these commands you have to use magic numbers for the seek-off sets and the right sizes so that you can write over the individual fields in the file system. And it's hard to know what those offsets and right sizes actually refer to when you're reading them back. So it's totally unreadable and you have to comment it and it's long-winded. So gfst is a little bit better in that regard because you can use symbols to refer to the superblock and the resource index and things like that. So it's a bit more readable. Gfst2Edit itself is a very useful tool. It's based on HexEdit and it has an encase user interface so that you can page through every block in the file system and it'll let you change the fields on a byte per byte basis just at the click of a button. And you can also use it in a command line mode so you can tell it to change a field in a script. But it has a quirky interface. It's not anything you're familiar with like the usual get-opt kind of user interfaces. And it's kind of evolved over time to grow all of these features that we've needed on a test-by-test basis. And so the code is hard to extend. And it only lets you change one field per invocation of the gfst2Edit command. But it's still much better than DD. So come down to the solutions. A couple of years ago, Steve added this meta.c file to libgfst2 which contains some static arrays of structures which describe the actual data types that we use in gfst2 file system. So essentially it's a description of all of the data types inside user-include-linux-gfst2-ondisk.h. So that allows you to read an arbitrary file system block and look at its type and then cross-reference that with the metadata description and that will give you a structure with all of the field types and the sizes and their offsets within the data type. So that kind of allows you to use the block, as you would, an object in a dynamic language scheme. So we get to the language. It's called gfst2L. That was always a working name. That was just the name of the binary that it produced when it was compiled. It's an interpreted language. It's similar in style to a query language. Essentially it has get actions and set actions. So it's pretty simple at the moment. There is some overlap in Functionality with gfst2Edit. They both allow you to modify fields and print fields from the file system. But I don't expect it to actually obsolete gfst2Edit any time soon because they both have their different purposes. And it's been designed for use in testing from the outset. So it's a lot more useful for when we're writing tests. So I never really had a specification for the language. It kind of grew organically. So a script in this language is a series of statements and each statement has an action clause, lookup clause, and a data spec clause. That's pretty much it, really. That's the language. And it borrows some syntax from C because all of the gfst2Edit team are C programmers. So we can be quite comfortable with it. So the data spec clauses are all the same syntax as structure literal in C. And also the statements can be separated by some semicolons. And hash is a line comment in the language so you can write scripts, executable scripts with it with a hash bang line at the top if you want to do that sort of thing. And it's command line friendly. It reads from standard in by default so you can pipe scripts into it and write scripts with it. And it also has some interesting options like minus t which will give you the list of the data structures that it understands and also minus f which allows you to inspect all the fields and their offsets within a certain kind of data structure. So one side effect of the way that the language actually passed allows you to specify a number of values in as hexadecimal or in base 10. So sometimes you want a value to be in hexadecimal because it represents an offset or an address and it makes more sense to represent in hex or you can use base 10 for sizes and things like that and it counts. So the implementation is pretty simple. It's just a flexlexer attached to a bison parser and they allow you to do some basic error reporting so it tracks which line of the file it's on and which column it got to so it can report syntax errors for you. And the way it works is it builds a syntax tree from a script which is a tree which stores all of the statements down the left hand side and it goes into the statements on the right hand side. And then after it's built a syntax tree it runs through it, traverses it and interprets it. So there's no fancy stuff like optimization or transactions in it at the moment. But it will be really easy to actually add those to the language if it turns out that they would be useful but we haven't really used this language enough yet to really know whether those kind of things would be useful to us. So the kind of optimizations I'm talking about would be say if you had two modifications of the same field in a block without a read between them then you might as well just throw away the first modification and just keep the first one, the second one, sorry. So here we get to the language. As you can see it's really simple. The first example just shows the basic form of a get statement. It will look up file system block 1, 2, 3, 4 so it will look up the block size from the super block and then it will use that to find the block 1, 2, 3, 4 in the file system and then it will sniff the type of that block if it has one, cross reference it with the metadata description that I mentioned before and then print all of the fields in an easy to read and easy to pass format. So the second example shows an alias. There are some hard coded aliases in the language like SB which is an alias for the block number of the super block. So that would just translate the alias to the address and process it as in the first example. The third one shows an offset. So our index is just another alias to another block number in the file system but you also add an offset of 0xaf which is 175 so it will read the block number of the R index and then add 175, use that as the address and then look it up as before. The fourth example shows how we would reference the resource group header blocks. That would bite by index and that one would just look up the first resource group. The last example shows how we can specify a path to refer to an inode. So it would resolve the inode block of a file called foobar and then take that block number and look it up as before. It's probably worth noting that this is obviously not a mounted file system. This is all done through user space through libgfs2 and through the language interpreter. So modifying some examples of modifying fields with the language compared to how you'd do it with DD and gfst2 edit. So with DD you would have to read from devzero, write it to the device at a certain offset with a certain write size and you would have to work all of that out from scratch depending on what field you wanted to overwrite in whichever block you wanted. So that's totally unreadable, it's hard to write and I just try to avoid it. With gfst2 edit it's a little bit better. The fields have symbolic names so that you can refer to them as things like SB and SBB size. So that's a bit more readable but this is one of gfst2 edit's quirks. You use the minus p option to modify a field where minus p is meant to be the print option but that option was changed over time to implement some field changing functionality. And with gfst2l it's a lot better. You can just set SBB size 0 and that's a script in gfst2l that would do the same thing. It's a lot more readable and a lot more concise. So Wetspace isn't significant in the language so you can format your code just as you would with C. You can make your data spec clauses easy to read. Just like in that example, you would set the daytime, day blocks and day entries fields of the inode structure in referring to the foobar file. So how would we integrate this into our tests? Well this is one example. You would create a new file system with make of s. That gives you a clean file system obviously and then you would use gfst2l to run a script and that would inject some kind of corruption into the file system. And then once that's done you run fsck over it with the minus y option to make sure it fixes all of the corruption that it encounters. And then once that exits, if it doesn't exit with the right return code then you know it hasn't fixed the corruption that you injected into it and then you can fail and exit. But if it succeeds, if you run fsck again with the no option to test whether there is still corruption in the file system then you know that the previous fsck didn't do its job and you can fail and exit again. But also if you put all this into a script, into a bash script and replace the name of the gfst2l script with a bash variable then you can loop through a bunch of scripts in a directory and run those with gfst2l and test fsck against a bunch of different kind of corruption scenarios in a batch. So, here's some examples of command line usage. You can easily incorporate gfst2l into a pipeline, just echo a script into it like that one. And you can use the minus t option, as I mentioned before, to print all of the data types at your disposal and then you can use minus f to print all of the fields in that data type with their offsets. And that would be good for type completion scripts, for example. So, where do I want to take this? Well, there are lots of ways we could take this. What I would really like is a way to translate some file system corruption directly into a script written in this language. So, one way to do that would be to extend fsck to recognize some corruption in the files. Well, it already does that, but to translate the corruption that it encounters into the language. And then we could take that, automatically generated script and put that into our test cases. One blatant omission of the language so far is the ability to modify the contents of blocks. So, for example, in a resource group header block, you have the allocation bit maps, but we can't actually change that with the language yet. And I've yet to think of a nice clean syntax for actually doing that. But if you have any suggestions, then please let me know. And the same situation with directory entries in directory blocks. It will be easy to add transactions. So, it's probably not necessary because we only have a small script so far. But it would be easy to add them just by creating a separate syntax tree for each transaction and running them in turn so that if the first transaction failed, then all of those, all of the statements in that transaction would fail, but the next transaction could carry on. I would like better error reporting. Bison gives you the option of adding error conditions into your grammar so that if it was, say, expecting a certain symbol and it didn't find it, then it would call a handler, which you can use to tell the user that it was expecting the symbol and perhaps they meant this instead of that, like all good compilers do. And also I'd like to improve the documentation. And one fun toy project would be to, once this language is matured, to the point that I think it could mature to, then you could probably implement makefs.gfst in it and perhaps some of the other tools as well. But that obviously wouldn't be very useful for us. That would just be something fun to have a go at. But that's the kind of power I think we could get out of a language like this. OK, so something I didn't put into the slides was the question of why we didn't bind an existing language like Python or JavaScript or Lua into this. And the answer is that once you've actually written the interpretation layer where the AST is interpreted into the cross-references with the metadata description, then you've pretty much done all the work that you need to. And adding a small language like this on top of it is pretty simple. So you don't need all of the standard library stuff and the garbage collection and the other kind of, or even Turing completeless that a scripting language would give you. But, yeah, if you want to play with this language, then clone our gfstuutils gituri and go through the build process documented there and it'll produce gfstuwell in the libgfstu directory. And you might also want to take a look in the test directory to see if you can see what tests already exist. There aren't many at the moment, but I'm very, very much welcoming submissions of new tests. And if you want to get in touch, then cluster.devail is the mailing list that we use for our development and announcements and passion missions and things like that. So if anybody has any questions or suggestions or criticisms of the language or the approach to writing tests, then I would very much like to hear them. That's actually a very good question and that's something I wanted to mention. But I didn't. I would like to explore that. I think with the right support from other file systems, all you would really need is a way to plug in a different metadata description into the language. So as long as they supported the idea of looking up an arbitrary block on the disk and there was a way to find out which type that block was, then you could cross reference it against the right metadata description and you could do it for other file system types like that. So it's possible, but it would need some kind of support in that way. Any other questions? Yeah, I think it could. I would also perhaps if there was a support guy out on site at a customer and they were experiencing some kind of corruption situation and they sent an FSEK developer, actually dumped one of these scripts and they sent the script back to us to show us what the corruption was. We could produce another script, which was kind of the reverse of that and we could send that back to the support guy who's on site and he could run that script and that would fix their file system without transferring these big files back and forth. But obviously that would all have to be tested and then we'd vary controls. Pardon? That way customer data is concerned. Any other thoughts? Right, thank you very much.