 Alright, hi. Thank you all for coming to Reversing Corruption Seagate Hard Drive Translators. Also known as the Naked Trail Data Recovery Project, because we've had some technical difficulties. I may speak really fast. I'm from New York. I'm sorry. Um, but thank you for all making through the maze and through the confusion to be here. I really appreciate it. My name is Allison. I'm one of several Allison Noctaborns. Uh, I'm a software developer by trade. I build high level blue, blue software. So this has nothing to do with what I do as a day job. I haven't really touched hardware since college where I did a bunch of robotics research and learned that cutting edge research is the easy part of getting published in IEEE publication. It left a little bit of a mark, about a dozen years worth of a mark. So that's really it for my hardware experience. I am not an expert in this. That's her. That's her job. So before you tell me I'm wrong. That's a slow transition. Look, dude, I'm Mr. Dead. Um, I named the project. It's a play on our last names. You know, uh, Born Naked is her last name in Dutch. Um, I've given Tux talks at HushCon, Teardown, and then Archreactor Hacker Space. Uh, I founded Revenant Data, which is a data recovery company in Oregon with Wireglitch who's somewhere around here. Um, I've co-founded a couple of Hacker Spaces, uh, the most recent one, like Pascal and PDX Hack Labs. I, I try to do, like, uh, workshops on reverse engineering and focusing on assembly and binary challenges. I don't have a formal education. I did not go to Carnegie Mellon University, like Allison. Everything that I know is self-taught, like minus, I, I did do Scott Moulton's day recovery training, the remote one. Um, I've never actually met him, which is kind of funny. Um, and I do roller derby and I like stress bake. I get stressed out and I bake stuff. That's my non-infosec hacking stuff. Alright. The original quest. Uh, we dubbed it the seemingly undamaged HDD in a really bad day. Uh, so what ends up happening is, this is out of order. Oh, that's fine. I get a lot of these cases in, translator corruption, which basically means I have zero access to user data or sometimes I'll have partial access. Uh, what, that, how that manifests is it'll boot, spin, it will get stuck or never, and by, you know, stuck and busy, it just means not recognized by the host. Uh, and then it never initializes. Another thing that can happen is it will boot, spin, click, not all clicks or clicks of death. Uh, and then it'll spin down. So when that happens, um, especially if there's clicking involved, and this is always the type of corruption that we're talking about occurs in Faro Seagates and Moose Seagates, which I know many of you may not actually be familiar with family names in hard drives, but you can ask me about that later. Um, so when the clicking happens, it usually is indicating the short points problem. Uh, which means that I have to short the read pins or short the read points in order to gain terminal access and then force a regeneration of the translator. Success. That's a little off and that's messed up. Success, but it's actually never really that easy. Um, it can become an, like an extremely awful time sink for me. Uh, like in order to, I have to short these read pins at a very precise time and then I have to send a control Z to gain terminal access at a very precise time and it can suck up anywhere between an hour of my time to six hours. I'm not kidding. My, uh, partner and I would often have to like switch off because we'd get tired of like sitting there and listening for the heads to leave the platter and then like start mashing control Z. So, um, what makes this extremely frustrating is once I get terminal access, forcing the translator regeneration is actually very simple, but it's this timing issue that drives me nuts. Long story short. Um, I went to the, the hacker space and started complaining about this a lot. I really don't like this. Yeah, but anyway, um, Allison overheard me and I, I said, oh, I want to automate this. I would love to be able to bypass this like four to six hour timing issue of shorting these points and sending a terminal command so that I can just boom, get to the translator regenerate, regeneration. And she chimed in and she's like, oh, I can script that. Um, I think you said, hey, I need some code. It'll be really easy. I'm pretty sure that's what she said. I may have tried to sucker her into it knowing damn well because I'm not a coder. I don't like I, I'm very little programming skills. Um, so along the way, while we were doing research to try to solve this particular issue, we started realizing there's a bunch of other little like much simpler tools that we could fit or that we could build. And we started doing that. Um, cause the ultimate goal where we are currently focusing on translators and also just Seagates, even more specifically these two families. However, several of our tools do work on pretty much any Seagate. Uh, 7200 dot 12 and beyond. Um, any who, yeah, we started, uh, working on other types of tools that we can fix. We want to turn it into a full on data recovery suite that's open source. Reasons to care. I know a lot of people think that hard drives are obsolete or near obsolescent, but, um, there are some pretty important reasons to care. Number one, right to know, uh, there are standards for hard drives. A lot of you are probably familiar with ATA standards, the INCITST 13 committee, they issue those. Um, the problem is, is majority of these are vendor specific. The tool that we built, we use the, um, terminal debug interface that has zero standards. Everything's vendor specific. So if you don't know what you're doing, you either have to send it to a data recovery company. It's going to cost you an arm and a leg, most likely. Or you have to sync, honestly, probably months if not years of research into figuring out what commands can I use, what does this command mean? Um, these lack of standards make it really difficult to, like, implement quality assurance. And again, you're forced to rely on very costly data recovery tools or services. Um, right to repair, I whole hearted believe ownership is not a timeshare. Uh, it really bugs me that manufacturers want you to use their data recovery services only. Um, again, I think that ultimately drives down QA. Um, they're not compostable and there are a hell of a lot of them, especially when you factor in data centers and the fact that they're still using predominantly spinning discs. Um, they're manufactured with cobalt on the platters. Um, that can, it can cause neuro degenerative diseases. It's been linked to ALS recently. And I do have citations for all this. And then of course silica also neuro neuro degenerative diseases. That's you. Yep. This podium is freaking me out. All right. So why you know Linux? Yeah, yeah. And apparently I can't use Windows. So I suppose I don't have any stones for a, all right. So of the discs that cross Mr. Dead's desk for repair, about 50% of them are Seagates of that just over 8% display translator corruption. This doesn't sound like a ton, but um, scale is a funny thing. So Seagate started shipping drives in 80 1980. And about 20 years later, they announced that they shipped 250 million hard drives and they've kept shipping at about that rate. So let's do some bar math. Um, translators really first appeared with the F3 architecture, which was shipped in about 2008. And if we say there are 100 million drives active, then we are looking at north of 8 million cases of translator corruption. So the odds are good. If you have a couple drives or you run a data center, it's going to happen to you. And when it does, you'll find out that this can be really expensive. We did some social engineering and we found that the cost of a translator repair can run you somewhere between $703,000. And in general, repair on a hard drive can run you north of 8,000. So getting your bits back can be extremely expensive. And I'll know about you, but I don't necessarily have a spare 8k laying around. So that's the reason you should care beside the high minded stuff. I'm going to try to get through this because I know it's kind of boring, hard drive anatomy. Number one, filter, not very important. Number two, however, those are the platters aka the media aka spinning disks. And that's where your data is technically actually stored. What else to pay attention to six, the the heads, right, the read and write elements of the drive. They're going to fly over the platters, reading and writing data. They're parked on the head ramp there. But honestly, the second or, you know, third most important element is seven. That's the head stack connector. So that's connected via that number eight, the ribbon to the actuator arm. And then on the other side, that's touching the PCB. And just remember, I was talking about shorting read points. Well, I have to know where those are because that's what I'm doing. I'm preventing via this solution. Though the read like the the heads seeking the service area, it'll stop I'm tricking it into giving me a debug mode. So anatomy, hard disk geometry. I wanted to bring like I wanted you all to kind of have a little bit of background on this because you know, CHS, this is introducing you to CHS and LBA service area is that grayish blue ring around the outside if they're outer parked heads on a ramp. It could also be on the inner ring, which that would be inner parked heads so heads are parked on the platter. I can't issue this solution with those types of drives all the time. It's harder because I'm literally listening and also kind of feeling for when the heads are leaving the the ramp. To the sectors which is the S and CHS. That is physical location of data. It is the smallest unit of data on the hard drive. 512 bytes of user data, 520 if you factor in ECC error correction code. And is then divided up into cylinders which are concentrically expanding rings from the center of the drive. If you have multiple platters, that's going all through every platter. I highlighted it there. Number four, that is a cylinder. Those are all cylinders. Some people will dispute and call them tracks. That's a whole big debate. And then that cylinder again, that would be the C and CHS. The H is the heads. So cylinder head sector. And a zone. That includes like several cylinders. And it is just a way to like aid in translator. But we're not going to get into that. There you go. It's you. Hardware is icky. Let's talk about software instead. And guys in the back, feel free to come in and sit down. We got plenty of seats. So what's inside the firmware that we really care about? Well, one of the most important data structures is the translator. And the translator maps from logical block addresses to cylinder head sectors. Anytime you see the word logical in a name, it's a Q that we made this up. Logical block addresses are an abstraction. CHS represent a location in physical space on the disk. Your translator will map between one and the other and vice versa. Well, how does it do this? Well, let's like all good data structures or famous hackers. It has a posse. Its posse is made up of defect lists. Your translator is a great place to hide things if you don't want the host OS to know they're there. Not that I would encourage you to do this. But in theory, it wouldn't be half bad. So the lists that you use as a helper to construct the translator as a data structure are two types of lists. We have primary defect lists and ground defects lists. Primary defects are defined as defects that are found at factory or creation. You're either booked or you're not. Sectors are just kind of bad sometimes. We make them, no one makes a perfect disk. Ground defects are the ones that accumulate over time. Flakes of rust, cosmic rays, aliens staring at your hard drive, stuff just goes bad over time and we know it. So we tend to stick them in two lists. The ground defect list usually has a lot more space than the primary lists because it's the size of that tends to determine how long your drive will really stay viable. In Seagates we find a second form of primary lists called the non-resident ground defect list. This showed up in... It showed up after 7200.12. It doesn't exist in 7200.11. Ask her. The architecture came out in 2008, so around then. But ask me about that later. Anyway, go, go, go. All right. These two lists are created by something in Seagate Lingo called a self-scan test. As far as I can tell it's a QA test they run. And when working with these lists they appear to largely function as glorified arrays but I haven't been able to fully prove that's what they really are. They just seem to act a lot like it. So, all right, let's talk about some lists. No, no, no, not lists. This is physical to logical. I was getting there. Oh. Wait, did you turn? Wait, I thought this was me. No, it's not my name on it. What? No, that's management. All right, we'll let you do it. Okay, so how does CHS map to LBA? This is, and how do skip lists and remap lists work? That's what this is showing. Up above the darker blocks is cylinder head sector. Those are the physical blocks, geometric location. Keep in mind CHS is always oblivious to these defects. It has no idea that it, that they exist. That's where the translator will come in. The primary lists found at factories are P list, NRG list, maybe a few others. We're not entirely sure yet. Functions like this. So, zero, one, and two, those are good, those are good blocks, good sectors, and they map identical address wise to the CHS. Physical block three is bad, so we skip it and four becomes three, and that's how a skip list work. It's fast, it's significantly faster than remaps, which we will get to now. An important thing to note from his perspective is they have a limitation. They're static, so we tend to pair them with primary lists because they're expected to be static and unchanging. Physical, physiological via remapping, which is what the grown defects list are, so anything in G list is functioning like a remap. Again, zero and one physically, you know, having the similar, having the same address. We get to two and three, those are both bad. The heads will, you know, hit that and check the ECC, it doesn't match, it knows, translator tells it, okay go to the reserved area, which the heads have to fly to a totally different area. This can really, really start to increase the time that you have to spend, you know, waiting for your data to show up. So a lot of times when your drive's starting to slow down, it can often indicate via smart data kind of sort, I don't ever really rely on that, but it can indicate a grown defects list potentially. If you're in CS, this will cause it to resemble a linked list in a lot of ways. Yeah. And, yeah, actually that's all I got, but if you have more questions, you can talk to me about that later. I know it's kind of esoteric. Ah, science translator may be corrupted. Again, as I mentioned earlier, the drive will boot, it'll spin, or it'll do the partial boot, ultimately it's never recognized by the host. If you're able to get, you know, terminal debug mode, you will see these errors. Pretty much anything that begins with LED is indicative of a translator issue. Ah, that top one there ending in A051, it's just kind of like a classic stuck in busy. Ah, could, could be, you know, lists need to be merged or the G list needs to be purged. Ah, the second one there ending in A7E5, pretty much always means in a faro or moose drive that I'm gonna have to do the shorting of the read points in order to gain terminal access. Um, the SIM errors down here, any SIM error that you see that's like 1000 through 1008, probably corrupted defects list, don't really know which one, you'd have to research that or memorize it over time. SIM error 1009, pretty much always means corrupted G list, which is something that one of our tools can fix. Um, but those are things to look for in terminal debug mode if you ever decide to do this for fun or you. I came up with a plan worthy of a bond villain. It is fool proof, it cannot possibly fail. So we're going to corrupt a translator in a controlled manner because I like properly testing my software, run it through a manual fix, take some notes, take some numbers, model with a program, test it on some unsuspecting members of our hackerspace and profit. Who's written software before? Show of hands. Or played with proprietary hardware? Yeah. Yeah, you all know where this is going. Unsurprisingly, we ran into a few problems. So initially, we were expecting to find three lists, the P list, the energy list, which is that second form of P list and energy list, right? When we start splunking in the hardware, we find two out of three. So far, so good. We are on track. I'm feeling good about it. It's warm and fuzzy. Warm and fuzzy never lasts. I'm the reason I can't have nice names. So instead of finding one G list, we found three, which became our first problem. Which of these grown defects list are we going to try to overfill? And are they all really actually G lists? Because we just sort of found a bunch of lists and had to figure out what they were, what they did, and whether they were actually useful. So if you were confused by the remap, the physical to logical remap, ask me later what is copied and why that happens and how. All righty. Anyway. So we ended up trying to fill some of these lists and seeing what the drive did and seeing if we got the sort of signs of corruption we were expecting. We ultimately settled on the eponymous G list. So the first one at the top. And that's the one we used for the rest of our experiments and testing. So that's the list we settled on for future use. We also found these two. They're called slip lists. That means we found a total of seven lists here when we were expecting three. It's a little crowded in there. They aren't relevant to this talk and we're not entirely sure what they do, but we found them and we wanted to share that in case they're useful to other people. So we expected that we'd sort of fill the list by hand once it fall over at 2000 entries, we'd move on. That's not how that went. So initially, Mr. Dead here added 3000 some odd entries by hand. Not that she was counting. I added about 34 before wanting to throw out the wall and deciding pi serial was the lesser of two evils promptly scripted that. So even then I'm like, all right, we'll run it overnight, it'll be done. Well, we ended up north of 6000 entries and the translator wouldn't corrupt. And we're like, we don't think it has that much space. It shouldn't have that much space. We've run some numbers we think it should only have like two gigs. When you finally donned on me that the translator was actually a little bit smarter than I'd given it credit for and it was adding entries sequentially and just skipping the entire block. We're basically telling you this contiguous memory was bad and it was like, okay, contiguous memories bad, we'll just skip that. So I ended up having to add a randomizer to the script so it add random jumps so that it couldn't block it. Even then we found that the list had way more space. We were expecting it to fall over at 2000 entries and it usually takes between 4000 and 5000 for it to fall over. The drive didn't like being up for a lot of that. If you look at the far screen over there, you'll see the drive powering down in the middle of me doing things. So we found we also had to batch it and give the drive pauses between stuff. And you'll see on the left, that's the screen attached in diagnostic mode and it's got way fewer entries in its GLIS than I was expecting it to have, which was sort of my tip off that it was doing something smarter. I'm a little embarrassed to say it took me a while to see that. I just thought I was doing it wrong or wiring it back and it was a little awkward, which happened a lot. All right. So firmware is really weird. It was a third problem I personally ran into. What does level 6 E2 do? Any takers? This creates user batch file 0. How about level T, I4, parameters 1, you already know the answer. You can clean the erasers after class. This clears the GLIS. I have no idea how this corresponds to that. So if anyone ever says this to you? Every time you say code is self-documenting, God kills a kitten. Why are you ruining my punch line? I've been waiting 10 years to use this picture from my coworkers desk. Seagate firmware is extremely non-self-documenting and that can be very difficult to work with when you have to rely on partially transcribed Chinese manuals . I didn't think it needed a slide. We ran a difference between firmware, different behavior, different output. Some commands we expected output from didn't output in the Apple firmware version. It took us a lot of head banging to figure that out. When you are messing with this, you can't count an output to be consistent across commands. In some cases, the Apple version doesn't work. I'll try later and see if it actually did something. Again, that timing issue and why I wanted to automate this because it takes forever. This is just a little graph that Allison made. It's the results of several timing tests recorded during our attempts at the manual fix to short the points problem. This is from powering on the drive to powering on the drive. What's not shown is the timing between shorting the points and setting control Z. We didn't include that because we never actually got that far. As you can see, there's a huge variance. I think it was roughly between like 4.28 or something like that, up to like almost 6.4 or something. This is only from a single drive and we spent about 4.4 or something like that. We didn't include that because we didn't want to drive by overfilling the G list and here we are trying to figure out, how are we going to script this timing issue? We're still not there. We had some surprises. There were more lists than we anticipated. The translator was smarter than I expected by 4.8. I expected to be able to fill that sucker in 4 hours and it takes closer to 48. Data recovery technicians and professionals are patient when we're working and then in the rest of life we're kind of assholes, I think. Some other problems we ran into. We're only human. We're mere mortals. Software engineer can't do that. I was trying to use her script that she wrote to intentionally corrupt. I didn't read the documentation so I'm just like, go, go, go. Nothing's happening, Allison. I went back to manually inputting. That was pretty awful. I also poked through the PCB when trying to short the points. It's okay, I have so many of these to play with. The significant timing variants. The command output that we're waiting for in order to verify anything could take between one second. It could be 10 minutes even with really expensive data recovery tools like the PC 3000. Pie serial was less reliable. Don't use it on windows. I'm not the best Linux user at all but this forced me to learn a lot more and get away from windows. Don't judge me, please. Where we're at now, not that we've hammered at you for some background and our experimental progress and emotional journey, we can consistently reverse corruption if diagnostic mode is available. If you see some error 1100 9, we know we can always reverse that. That one works very well. We have good test environments. So if you want to play with reversing corruption, we can get you a drive that is in shape for that. The unfortunate thing is though is we haven't been able to reproduce the original target solution on our corrupted drives. Because of the variance in timing and the fact that I don't have a way to programmatically detect where it fits for the original problem. But that doesn't stop us from trying. So before we show you code, the development philosophy for this project is a little different. I generally believe that not all hackers are coders. It's a little bit of a controversial opinion these days in some circles but I do believe that and I have met some very technical people in the data recovery community who I would be so crass as to name names at a podium while we're being recorded. So this makes my hacker spirit sad and the whole point of this project is that the data privacy community has to there was not account of caffeine in that latte I'm saying. They have to operate with this giant veil of secrecy. Manufacturers are withholding information a lot of it's data. So our goal is to make this open and if you can't read the code then you don't really know what it's doing and you haven't learned. Code represents intelligence. So this code has been written with those people in mind. If you are a developer and you are a purest of your eyes because there are fragrant violations of dry it is over commented from developer perspective and it is very procedural and it will take a lot of time to do that because they are not my target audience I want people who do data recovery to be able to read it. It's in Python for two reasons one because it's the lingua franca of security and it's our poison of choice but also because in my experience teaching people it is the easiest type of code for non coders to understand. It is also GPL v3 because we are interested in making this free time adding to that so if you want to yell at me about the license choice feel free go for it. Please file your bugs with this philosophy in mind it is a little bit different. So to the meat of it what you are going to need to implement this at home with one of your ferro or moose sea gate drives side note I wasn't entirely comfortable with this and definitely very used to my open source so to do this entire thing if you already have the drive shouldn't cost you more than 15 bucks you just need a usb ttl adapter with the ftdi chip ft 232 access to the ground pin so you are going to have to shell your drive. If you have if you are not using one of the adapters that has shared power with the host you are going to need a power adapter you need the code which you can find on her github you will need to install screen to make sure you are connected to the hard drive and you can send these commands that you can cause corruption and reverse corruption. We also strongly encourage that you never use this on a hard drive with data that you give a dam about don't. You do have to match it in a very particular way. All hard drives are broken up into these bizarre families and lots of other things generally with the seagate moose and ferro you match the model number you can ID it if it is a good drive via the debug mode and I can document that somewhere there is a special command you can send for that if not you can kind of go off the model number ferro and moose drives almost always and in something something as with the model number. Match the firmware versions because again the exact specific type of corruption we were trying to fix with the shorting of the read points in order for the terminal access doesn't incur in a couple of like Apple it can occur there but it's difficult to fix because when it's seeking the service area and at what points it's reading the service area are different than in other firmware. The Dell specific firmware JC4A we were never able to get corruption I don't know why I don't know if it's somehow impervious to GList overfill I'd have to do a lot more research to figure that out. This is our setup that's my little pin read point shorting tool that I it's just tweezers and I tried to space them as perfectly apart to quickly hit the read points which are in the middle there and yeah it's just solder wrapped around the wrapped around there trying to bridge it it's a really shoddy setup and I'm embarrassed by it I guess and yeah those are the read points we're aiming for when we're right after power on listening for it to leave listening for the heads to leave and then once we actually did reach corruption because we have done that I just wanted to verify it in one of my fancy tools so it took up to the PC-3000 over there they have these fancy terminal connectors that are way better than way better than that much less likely to cause connectivity issues so yeah when you're doing this just make sure that you know you're crisscrossing your Rx and Tx from host to device it's like honestly kind of a kind of a trek to even figure out what pins are Tx Rx on the drives so Tx is always closest to the SATA then Rx right next to it and then ground is the very last so we didn't show the power connector it's there please be very careful there's a lot that you can do to destroy your drive and never get access to your data again so again if you're going to be doing this choose to start with a drive you don't care about at all and before you start very first thing back up the service area which yes you can do via plug mode commands you can save that way ask me about that later but definitely definitely definitely back up the service area alright so you've got your equipment you're ready to go please get your software installed in order remember to give it executable permissions hook up the connectors double check your connectors if you're in software have someone in hardware check your connectors determine what port your OS has been assigned read on the scripts read the I'd say read the documentation before you start but I'm a realist and know that nobody does that apparently so before you hit run please read the documentation because the order of the LBA addresses you enter matters so yeah running this once you get all your stuff and set up and are good to go is actually pretty quick however there is one gotcha I do want to mention so as we mentioned earlier especially when you get a full g-list your remap operations get really really slow and we did see some of the commands to add entries take north of ten minutes to fully execute so the script will time out you'll see received byte zero in the setup you can see on the left that we are actually getting the sectors reassigned I added to the g-list even though the script says we got nothing back this is because I went and made T and came back and there was the output so be aware that if you see receive zero you get an indication that the operation failed it just means it was taking longer than the script was willing to wait so don't freak if you see those alright so what's next this is the shameless plug for help we would like to try solving the original problem without shorting we'd like to write some code to configure the translator to rebuild based on the list you specify you may remember that we have seven of them to pick from I'm guessing you'll get some different behavior performance if you rebuild the translator with different lists we'd like to expand this to include other seagate families or even better other manufacturers could potentially do this with the ATEA commands because they are not specific to seagate so again long-term goal we would like an open source data recovery suite so that people don't have to spend an arm and a leg or hundreds to up to ten grand sometimes on data recovery feature requests again our current focus is on translators and defects list issues we do want to expand to include other solutions and of course other manufacturers code contributions just make sure you please keep Allison's philosophy in mind you know make this for people that don't know how to code we would love help testing that doesn't always include just writing scripts there's other things that I would definitely like help with and you can ask me about that later if you're interested open knowledge about the firmware all this again data recovery super super secretive the manufacturers are secretive other data recovery companies are very secretive you know it's hard there's a lot of digging there's some great forums out there HDD oracle is one of my favorite I highly recommend you visit that if you haven't so acknowledgements speaking of HDD oracle I really want to thanks build it I've never met him HDD oracle and he's just a fantastic researcher just wants to he loves what he does and he's not in it to make money and I think that's it and then I'd like to thank Leet Bunny and the two goons who sprinted over cross hotels to get us a projector at T minus five minutes because I don't think I could have done that yeah Leet Bunny is awesome he's always wearing a green hat pretty much definitely buy him food or booze or something and then bibliography you can find on my github that I never use or that little URL which is totally safe I promise and if something happens find Dean Pierce okay so we were supposed to take five minutes of questions but because we got such a late start we're going to take all of our questions over under the hardware hacking village sign to try to keep this room as on schedule as it is feasible to be at DEF CON so thank you all for coming thanks for forwarding the maze we look forward to seeing you over there to have a longer conversation