 Y Llywodraeth Llywodraeth o'r Sfysiwdd, Llywodraeth i bod oedrygiadau yn dod a llwyddod amgylchedd yn y Cyffredinian Rhaid i bryd ychydig, Ond y cyfrifloedd cyflyp sydd oedd eu cefn am dychydig mewn chyfeidio, dychydig mewn mwyaf oed yn sylwedd y Llywodraeth. assumption eich lleidio wedi cael ei neud y byddai'r Llywodraeth enw. it is. 어, so there's quite a lot of hardware built into computers, mainly server computers, actually pretty much, pretty much everything these days, has air protection built into PCI bus. Dyna gweithio'r unigion hefyd – mae'r plçau o erbyn ar gyfer y pryd yn gynyddu i'r mh无en i'r rhai. Mae'r stethau oedd yn gwneud yn ddead o'r rhai. Mae'r parvaell yn rhanol yn cynnwys, ac mae'r rhai eich rhoedd, bydd y dyma'r hoffi, oherwydd mae'r hoffi yn i'u bwliau amser o'r rhai o'r rhai o ran 100 o'r rhai o rhai. Felly rwy'n ddweud ychydig, a efallai phoblau o'r cyllidol, yn cael ei schyfin i wneud i ddod yng ngrymu'n iaith, ond mae'n yn cael ei wneud i gael i, yn ceisio i Gweithl wszystkich Eidol, ac yn cael ei chrydydu i'r arddangos gwaith yma, gallwch yn cael eu gwaith pobl yn cael ei adnod o'r cyllidol, ond mae'r rhain ei ffordd trefwyr yn ddweud o'r cyllidol, ac yn y gwneud o'r rhain ei ffordd, ond mae'n seis i gael hynod. Felly, mae'n golygu'r ysgol yn ystyried o Halware ac mae'r Halware ynglynch yn ysgol yn gwybod yn rhoi'r farchwladau ac mae'n golygu'n gwneudio'n gwybod a'n gwneudio i'w ffordd, a'r hynny'n gwybod i'w gwneudio'n gwybod. Nid oeddwn i, rwy'n gweithio. Felly mae'n golygu'n gwneudio'n gwybod i'w gwneudio'n gwybod, a'r tyffoedd o'n gwybod i'w gwneudio'n gwybod.wr oes dweud y cyfrifeth gyda'r cyfnodau. Mae'r ffordd i'r ffordd yn ddigonio'r cyfrifeth bydd a'r yr Britain Cymru. Mae'r cyfrifeth bydd yn duodd iawn fel yr Llyfridol bydd wedi'i gael eu coesion. Mae oes neu'r cyfrifeth bryddo'r campau ar gyfer Fethon i gyfrifeth, oherwydd mae'r cyfrifeth yn chyniad amser mwy ar y rhan. y gwirionedd ynghyd yn y bwysig. Mae'r ydych chi'n cerddwyd yn Llaff, ac mae'r system bwysig yn Llaff yn y cyblion. Mae'n cofnig o'r cyfreithio cyfrannu cyfrannu cyfrannu a'r cyfrannu cyfrannu o'r cyfrannu cyfrannu. Mae'n gwirionedd o'r cyfrannu cyfrannu cyfrannu o'r cyfrannu cyfrannu cyfrannu. Mae'n gweithio bod gyfrannu cyfrannu oherwydd mae'n dweud yr archwyr hynod yn y cyfrannu cyfrannu. ac mae'r sefyddo ac mae'n rhoi'r sydd yn gweithio'r byw'r hunain. Llywodraeth Ngolwch yn bwysig, phôl yn garadau, fyddai'r tŷ i fathau sydd yn gweithio. A beth yw'r rhai gynhau y fathau a'r dda, rydych yn ei gwirionedd gan unrhyw o'i gwernig, mae'n trwys llwydd â'r lluniau a'r lluniau yn grwysig, os nid gennym ddechrau ac yn gwleidio gwneud o'r ddechrau gwpa yw'r profiad yn agorig a chyn nhw'n dda ni. Y ddechrau'r syniad yn adrywedd mecanedd y ddechrau'r syniad, ac nid oedd y ddweud ddechrau hynny. Mae yna fi wedi bod nhw'n ddod yn gweithio'r broses, ac mae'n ffais yn gweithio'r poeyezol o'r ddod. Yna rheoli yn bethr. Mae o'r llunio'r cyfnod am adael o'r blaen. Rwy'n gofio a bod yn gwneud chestbwydig folkodydd o ddeudio y Llyfrgellbellunning felly o'n gwneud o bwysig atwyr hissyl. Ac mae'r byw yn gwneud hynny deall ar rydym melyno'r lleol ar gyferiaethol oherwydd, mae yn bwysig i ddechrau'r rhannu hyn yn gweithio'r cliriaeth. Cysylltiad o hyd y gallwn. Diolch yn dda oherwydd rydw i ddweud. Mae o'r gweithio'r rhannu hynny'n allan o ran y cyfnod yma. Rhaid i'r creu ffordd ar gyfer ond raickau ar hyn yn bwysig rhai rhai ar rhai ar y chafer. Ychydig eich lidi ddwylliant ar y Lans. Ac mae'n bwysig rhag opeth ar y Federation hwn yma. Mae'r lidi ar yr llunain a'r lidi ar y Lans. Mae'n ddiwedd ar y Lans. Rhaid i'n bwysig yr hyn ac yn ychydig ffyrdd yng nghymru yw rhaid i ddwylliant y siarad. Saes i wneud llawer o'r bobl hon hefyd iddynt. Felly wedi gandddo'r bwrdd o nifer o'r cyfrithiau ymwyaf. Felly dyma gwaith y mawr nifer o'i cyfrithiau i ddi fod o gyfrithiau a'r bwrdd o'r bobl rhaid o'r bwrdd. Ond o'r tyfu yw'n cyfrithiau i ddod yn y bwrdd. Felly, mae'n gwneud yn ystod o'r ysgolion sydd mewn iawn o'r ystyried i'r cynllun yma. Byddai'n ein bwysig yn ysgolion os ymlaenol. Mae'r projess eisiau yn ydiol yr ymmellans erbyn 6 o 7 o'r gweithiol, sy'n ymlaen, yn ymlaen, oherwydd mae'n hynny yn cael ei'r cerddau. Mae'r bwysig yn ei ddweud i'r rwynt i'r gweithio, yn ymlaen, mae wneud arall yn ei ddull yn ysgolion ar y dyfodol, yr hynny wedi gynnag yn ymlaen. Felly y hyffordd Cymru wedi cael ei helydd i gael y ddechrau cysylltu am y gynhyrchu ac yn ddechrau'r gynhyrchu Cyfrifiadau, sy'n defnyddio ar gyfer y cyfrifiadau, ac mae'n gynhyrchu i'r codi i gael ar gyfer y cyfrifiadau. Mae'r ddau yma wedi gael mae'r ddechrau'r gyfrifiadau a'r ddau'u cyfrifiadau, ond mae'r ddau ar gyfer y cyfrifiadau eich cyfrifiadau. Mae'r cyffredin bwysig ar eich cwrs. Mae'r cyffredin bwysig yn FPCI, mae'r cyffredin bwysig ar gyfer yr eich cwrs ac mae'r cyffredin bwysig ar gyfer y ddod a'r cyffredin bob hŵn teithas ar sgol, ac mae'r cyffredin bwysig ar gyfer y ddod, mae'n ddalod i Increedenaeth Dynu. So, HEDAC went into the main icon in 2616, the CISFS API wasn't in 2616, that's been out since I mean I went into 2618 but it's not fully stable yet so if you're writing stuff against it, we're not supposed to change it but we're getting into it. It's marked as, I think the coastal market has experimented with the current stuff, I think we are just about allowed to move around since it's API without the hutching current main item into upset. Okay, the current space stuff, the guy called Doug Thompson who used to work for Linux Networks, they sell Linux clusters to people so they're sending two racks full of high-form computing to do whatever your big biology or business problem. So, pretty much every PCI based system from like a little arm based slug up to some monster thing with 18 PCI buses is supported PCI area checking. For RAM, about 90% of recent machines with ECC RAMs are supported for area checking on 186 and a few faculty and MIPS things are starting to be written to support it as well. Currently the mechanism is polling from kernel space. If you've got a lot of PCI devices there is a bit of an overhead. It's less than 1% but it's at the standard polling interval that you can turn down the polling interval if you want. The reason it's called is that the mechanisms for polling are consistent and they work but the interrupt driven reporting is not always very much tested on the hardware and it's a little bit less consistent between machines as to how you set up the hardware to interrupt you when you're in error. Okay, so the polling works and it's well tested but the interrupt driven stuff isn't so well tested yet. But long term we'd like to have an interrupt driven because that's a lot neater in it. Okay, currently also the user space holds the kernel space which again is a bit ugly. Not doubly the expensive just reading assist as far but it needs re-implementing. Well an additional mechanism needs implementing so that the kernel space can hold the user space when something goes wrong. And the kernel space stuff is in the edge there. User space stuff has been written more recently by Lawrence Livermore National Maps in California. There's an access library and a couple of mine applications and they're not matured yet by the means. This is more or less LLNL's internal code. You don't actually need it if you just want to write a little poll script to reach SysSX variables. Long term this is why I need to get this in Derby in the next release. I've done a little bit of packaging of it in the last couple of months but there's quite a lot of string files. So I started on packaging and then spent the next two days fixing the upstream issues. So there is a Derby in sub directory in the upstream sub version. If you want to build a Derby, you can but it's not very useful yet unfortunately. Currently you just want to monitor PCI errors and not ECC on the system. There it goes, OECC. And I'm really letting you do that at the moment but hopefully I'll be fixing it in the next couple of days. PCI parity errors, is everybody familiar with the concept of parity checking on data? Is anybody not familiar with parity checking on data? Good. Now the good thing is those nice people at PCI special interest group have mandated PCI parity checking unless the only things that you're not allowed, sorry the only things that you don't have to do PCI parity checking on is stuff where data entry is not essential. Like done frame battles, if you've got a pixel wrong they say you don't really care. And also if you never ever expect your device, your PCI chip to be built onto an expansion card then you're allowed to skip it but in effect 99% plus of PCI devices support parity checking or needs to claim to it. You have to turn it on and a lot of biases don't but there is code in ADAPT to enable it. So as I mentioned before most system in points of view where you really don't want PCI parity errors is on hardest controls but also pretty much anything, any sort of PCI parity error is bad because you can trip up a driver and the driver can crash so even if there's a video card that's misbehaving it can beat the system in its diversity. And I can always guarantee that people here have had PCI parity errors on the systems they're in and they didn't know about it. They seem to be frighteningly common. I've definitely experienced them. I've got 14 sound chip on this laptop but I didn't know about it until the sound was a bit unstable. There's PCI data parity errors. Common cause is DTH connected on PCI cards, 40 PCI cards, 40 motherboards. Bad design or false code in that particular unit. Bad power supplies can cause data glitches. This is a very good way to catch them. You've got a noisy power supply. You can upset your PCI transfers. Occasionally there are some unfortunately some CCI ICs because this stuff has been traditionally so frequently used. A lot of a few 40 chips have slipped through the net. So there's a bit of a chip in an eight problem. We need a good blacklist for those chips otherwise we worry users with false positives. Because when it first went into the camera a few users were reporting these false positives, the current archerym camera might have decided to turn it off by the fault. So it's off by the fault because we don't have a good blacklist, but we don't have that many users so we don't have a good blacklist because it's off by the fault. So if you test it and find any misbehaviouring cards, please go to the blacklist, which is on the wickie, which is currently empty. OK, people are familiar with how ACC RAM works pretty much. Anybody like it for code review? No, everyone's happy, good. OK, so again you guys probably know this, but memory areas, 40 RAM chips, 40 boards, again dirty edge connectors, overclocked RAM. Those would be sort of obvious causes. None obvious causes are cosmic radiation. And if you've got enough RAM you will get memory areas because that's where the universe works. And even if your hardware isn't full working or you won't be getting memory areas, hopefully your ECC is going to cope with that, but it'll be nice to know. The way that most motherboards are set up, they will only tell you about uncrectable memory areas. So you can have a dim with one chip on it entirely bad and it will just appear to work. But that dim isn't an error correcting dim anymore because every single read is getting an error, it's been corrected. But if you get any further errors in another chip, then you'll see correct. So without EDAP, you don't know that at all. I've seen this on a system where it appeared to be working fine and in fact it was working fine, but EDAP reported every single read to make memory was picking up errors and one of the chips on a dim was entirely not working and hadn't been perhaps since it was installed. So that's not really an ECC dim anymore. If you do get something like cognitive radiation causing an error elsewhere on a dim, which will happen definitely in the lifetime of a dim, then that would be an uncrectable error. So, yes, the current place of doing an etch you just need to know the user space stuff or if you want to drive CIS at best directly you can do that. If you do that, then you will have to turn it on as you get a slash CIS devices system EDAP sub directory and there's a little switch in there which you can echo a 1, 2 and it will enable the PCI power and detection once you've found a module's level. OK, I'll cover this slightly, but if you don't have EDAP loaded, probably your PCI clarity areas are going to go unchecked and you will get this production and not know about it. If you get any correctable RAM errors, nothing on the system knows and actually on some systems if you get uncrectable ones they're not properly set up to tell you either. But on most systems, on most ex-heli systems you'll see an NMI gets triggered on the first uncrectable error because I do the first one because if you don't clear the error then no further animise occur and you usually get that message here in your kernel log. That's fine, it's better than nothing but it doesn't give you any indication of whether it is actually around an error and it doesn't tell you which chips has got an error and whether it's a one or four whether it's happening in continuous basis. What about IPMI and the system event log? Yes, some systems have out-of-band methods to report this and they're basically in this in which case IPMI is basically doing what UDAC does but if you have a system which doesn't implement this stuff behind the kernel's back then you don't know. There's a few exceptions. There are some systems which do check this stuff using something like IPMI which is basically, is everyone familiar with IPMI? IPMI is a slightly funky management spec from Intel originally and a few other people but basically you have an extra computer a little extra embedded computer which runs on your motherboard which the kernel doesn't get to see but it sort of takes care of hardware and it has weird things like backdoors into the Ethernet chips so it can communicate down a network while the kernel doesn't know and stuff. So you can implement things like fan failures over temperatures, that sort of monitoring on this spare other computer in the IPMI spec and some of the systems do also check ECC and parity areas. Some of them don't do it very well so they can actually corrupt PCI transfers because they can try and do PCI transfers and if they interrupt the kernel when it's doing a PCI transfer sorry, when it's doing a PCI configuration transaction then they can corrupt those so yes, some biases do have mechanisms for taking care of this themselves and we'll be in the beta or something but actually a lot of them do it in that bad design way and can cause as much trouble as they solve by getting by tripping up the kernel when it's trying to do PCI stuff feel free to interrupt at any point if anything's not clear I should have said earlier. Okay, when you've got ADAC loaded and configured properly you get you get these counter there's various counter files which turn up under such this and these includes this will give you memory error accounts you also get if there's an error you also get a log in the kernel on telling you what's happened and you get you get the address of a memory error and if you told me you get some system about how that maps to your motherboard it will tell you which slot has got the error and you will get a PCI slot number for any PCI that's yours and you get to a pair of pounds and you can work out every frequency and that sort of thing so we really need more people we need really more testers and we really need more developers it's got a bit better there's a few embedded companies which are now contributing code and helping stuff along but we do really suffer from that development time it's a sort of chicken egg thing because it's not it's not super stable and super super well tested at this stage so we handle that for any users if we have more users we get more testers et cetera et cetera so it's having a frustrating slow takeoff on the way that it is momentum is gathering okay if you find you haven't got support for your hardware these are some really easy kernel drives to write I wrote my first one in a day and I haven't written a kernel so if you want your name and license then it's kernel any questions at this stage you're asking whether PCI express errors are also reported basically PCI express spec even though they seem to be very differently I believe all the registers are in the same place in the back of the battery way and it should just work on PCI express as well anyone else? I don't need to present do you just see a stuff that isn't quite there and I enable my kernel module or do I need to do that to start seeing the things and how will I see them well there are some devices there are some special files which turn up under slash this and you can write a front job or something to check for these these are non-zero and send you an email when that happens if you've got some sort of log monitoring stuff that's probably going to pick up the kernel message logs as well when you get an error you get something in a kernel message log you get a sort of human-readable thing in a kernel message log and you get a machine-readable thing turn up under slash this there's some sort of future plans we could be more intelligent we could inform device drivers about failing PCI devices so they can do clever stuff like retry a transaction as for them it would also be good if you've got a bad RAM on the system or RAM that you suspect is going bad because you're getting critical errors now a good thing to do is to stop using that bit of RAM this sort of stuff should be implementable that has been implemented yet so again if you fancy some kernel hacking there's some good opportunities there at the moment and as I mentioned before it needs to be more interactive you can actually do some of this stuff you're just using SPCI if you don't want to use EDAC if you want to drive yourself to user space you can turn on PCI, error check-in using setPCI and you can read error status in user space using LSPCI you had a comment in earlier style about CVS quote patches for AMD off-rolling into 64 things I mean you need to provide some patches while using raw kernel there was a bit of it took a while for the there is some overlap with some some code that's been in the AMD 64 kernels for a while and for a while those resistance up-streams of putting the EDAC AMD K8 drivers in that they are now, it has gone in to obviously it's going to go into the next stage kernel so currently if you want AMD 64 support then you have to build the drivers from the EDAC project you don't get them in the stable kernel in version SNH and that goes for all AMD 64 bit chips because their architecture is that the memory controller is built in to the CPU so one driver is all of them just for the AMD 64 bit as well that's it thank you very much