 Hello and welcome to deciphering capture. My name is Michael Brooks and in this talk I'll be discussing a truth of my automated capture solvers work These two attacks have been identified by CVE 2008-2020 and CVE 2008-2019. The name of this talk is deciphering capture, but I do not consider capture to be a cryptographic primitive Although there are similarities between the two I'm using the word decipher as to make out the meaning of despite the indistinctness or obscurity My biggest motivation for giving this talk is in reaction to how poorly documented capture solvers are. I could think of no other Widely used security system with so much misinformation associated with it One of the one of the effects of this misinformation is the widespread use of weak captures on the internet today Before making my research public I could only find a single CVE that was actually solving a Turing test. This is CVE 2007-3308 Instead the clear majority of CVEs against capture have been the taking advantage of implementation The first capture we'll be looking at is CVE 2008-2020 This capture is the one used by PHP Nuke If you want you can go to PHP nuke.org and see it. They're still vulnerable to this day Even though I reported this flaw back in April Computers are great at performing a lot lots of permutations and they're only gonna Computers are great at performing lots of permutations, and they're only getting better at it The number of the possible permutations for this capture is 10 to the 6th, which is actually is exactly 1 million As an attacker we want to inverse the capture function to do this We must utilize the very code used to build the capture to build a lookup table of all possible corresponding outputs The inputs of this capture is the answer and the output is the challenge image Instead of storing the entire image I'll put it from the capture function. I'm storing just the message digest That's using this method. I'm able to use less disk space for building a very large lookup table And also lookup time is as Is considerably less for storing the entire image The ability to use MD5 allows me to use rainbow crack But this is key you can't use rainbow crack on decide on breaking all captures only this specific capture because It is a one-to-one function in the mathematical sense of the word If you start introducing more randomized data to the capture The it no longer becomes a function and that one answer one Will have many Challenge images relating to it However, even though you're adding randomized data no matter how much randomized data you're adding to this capture It can still be reversed. You can iterate through all of the random call functions producing every possible combination in Fact the phrase completely automated Turing test to tell computers and humans apart is a logic error if you have code to believe it to To build a completely automated test then you can use that code to build a table of all possible solutions There were now this is a laborious process. However, it doesn't require much technical ability Now in the case where you don't have the source code in fact PayPal comm right now if you go to sign up for a PayPal account, they're using a very similar capture It doesn't contain randomized data other than the Than the than the actual message unfortunately, I'm not able to show it to you but using a capture like this the the possible the largest number of Calculations is a 36 to the five which is 60 million four hundred and sixty-six thousand one hundred and seventy-six possible combinations Now take a many man hours to complete this But important but the PayPal sufferers suffers from another issue by refreshing the login page You can it'll a new capture will be given so by building a smaller table you can you can still break break them by Asking for another challenge another capture challenge Instimation every open-source capture is vulnerable is vulnerable to a permutation attack permutation attack against captures are recognized by my CVE 2008 2020 To protect my company. I had to implement recapture Recapture net which is a free service. It's a closed-source capture. It is developed by Lewis von on who conducts a very productive provocative research Large number of man hours are wasted at solving captures each year Recapture harnesses the power of human computation to digitize books which betters all of humanity I feel privileged to live in a time where Organized book burning such that of the Nazis and extremist religious groups can no longer cause harm The next capture is more interesting Capture attack is more interesting the attack is much more complex This attack is 2008 2019 it is an audio capture built for the simple machines form a Capture solver was created for an earlier version of this capture and it is identified by CVE 2008-3308, which is the only other CVE regarding captures To address this security issues simple machine to form developers added noise to the audio file Two major components to decipher the signal or no I use two major components in order to decipher the signal despite the noise The first component is a hamming distance This is named after Bell researchers Bell Labs researcher Richard hamming hamming distance was introduced in the fundamental paper entitled Error error detecting an error correcting codes in 1950 a Hamming distance measures the noise between two signals that should otherwise be identical Hamming weight analysis of bits is used in several disciplines including information theory cryptography Bioinformatics and genetics it is used it is the fundamental algorithm used to compare fingerprints and iris scanners in For authentication is also used in optical character recognition software like tesseract I Obtained to the cat I obtained a hamming distance function implemented in PHP from the open source project PHP for bioinformatics this project is performing hamming distance calculations on sequences of DNA The other component in this capture attack is fuzzy logic Which is tool used in artificial intelligence in short fuzzy logic is a range of truth between zero and one But the logic is being used today in places such as washing machines to tell you how dirty your laundry is and in dryers to Tell you how how dry it is Using these two elements together alone just by performing a Hamming distance calculation on on the very beginning of the file I was able to find the first letter at this point. I was absolutely ecstatic I was jumping up and down and I'd broken I'd broken a capture using using these systems It was only the first letter In order to find the other letters It was proving to be more difficult The ability to find the spaces between letters was difficult you it was There were there are methods, but it was very heavy very heavy calculations I did find however a problem with the capture after each letter. There is 56 5600 bytes minimum space between each letter so after finding the first letter. I would skip ahead 5600 bytes then I Well, there's a there is a a Problem with hamming distances or at least it's a implementation issue well In order to perform a hamming distance calculation both strings must be of the exact same size So you in order to identify a letter. I must know exactly where it where it is where it starts and Even where it ends The solution I came was to run a whole lot of hamming distance calculations. I'm running literally thousands of thousands of them around In order to solve about one capture it takes about 14,000 hamming distance calculations. Luckily these are very cheap very cheap calculations There is there are other tools for performing fuzzy calculate Buzzy comparisons on strings and one of them is called the Livingston distance and it doesn't suffer from the same shortcomings as hamming You can perform livingston distance calculations under strings of different size sizes. It's able to identify Comparisons despite transpositions of data or moving data around or injections of large data Unfortunately, living that it's in distance calculations are extremely heavy performing a calculation only a few kilobytes of data will require a gigabytes of memory Livingston distances are being used in audio processing as an example. There is US patent 6073099 Entitled predicting auditory confusions using weighted Livingston distances The great Alan Turing is the creator of the Turing test and when and where capture gets its name During himself lead this test would know it would eventually become obsolete in 30 years It would be easy to ask it. It would be as easy to ask a computer the question Okay, excuse me in 30 years It will be as easy to ask a computer a question as to ask a person. This is Alan Turing in 1946 Well clearly That is not the case. However, I believe that This eventually will happen that The captures will Eventually become obsolete many talks attacks and software becoming obsolete and so are defenses Information security is an ever-changing landscape, and I don't believe it will ever come to an end after 3.5 billion years of evolution Organisms have become incredibly adept at defending themselves from viral attack Even with the incredible capability of the human immune system, we still suffer from many methods of code execution These virons are in the room pumping through your bloodstream as I speak New methods of attacking software are developed constantly Software that was once believed to be secure can in fact be extremely vulnerable to new attacks For instance, the format string vulnerability was discovered less than a decade ago and more recently the dangling pointer attack presented at Black Hat only last year Programmers are human and we must make assumptions about how our code will be used We cannot foresee every condition that our code will encounter. I enjoy that journey into the unknown and the fruits of my labor of love Unfortunately My talk would have ran a bit longer due to my presentation materials. I'll try and I'll get them working or the visuals I'll try and get that working the code I have for both of these captures are available on my website Yeah, the X-Play code is available for both of these. I encourage you to download them Particularly the code is not very complex. I think you'll be surprised on how simple it took to break some of these systems. I Think that's a sign of elegance. I Encourage more research in the area of captures and people to be open I find it unfortunate that many people keep these these methods secretive and that They're Unfortunate Yeah, my materials are on my website brookssecurity.com I'll have updated materials as well. Oh, X-Play code is also available on the DEF CON CD for both of these attacks You see it in action, uh-huh Okay, okay, sorry, I do apologize All right This is uh, this is the the PHP new capture as you can see Very simple right very simple I kept but you would be surprised the the CVE for this capture people using this capture It was over a half a dozen software packages and there were more This Now it's going to build up the lookup table I start I start iterating I start at zero zero zero In the in the far column you see that is the MD5 hash of the the actual image I continue on from zero zero zero all the way to nine nine nine Very simple Backwards all right This is it actually a one-to-one function similar to f of X equals X to the three for every one input of Of the answer there is exactly one output Now so when the function is inverse it is still function. It is f of X to the to the to the well one third anyway Which is shown by the by the graph now if you If you inverse a In non one-to-one function it becomes and no longer becomes a function in terms of for one value of x there now Yeah, for one value of x there are now two values of y so Rainbow crack is able to Is able to break specifically this capture because of its because of its one-to-one nature Okay, this is the audio capture the first this is one of the main signs. I wanted to show you Okay, the picture you see at the top is the raw audio file being put together This is without the noise It runs through a noise sub-subroutine as the patch and you see at the bottom Actually picking out I mean at the top slide picking out that black space is pretty easy I mean you can do with a string compare but doing it It becomes more difficult later on not only not only are the black spaces Changed, but also the wave file Each letter themselves are changed and we can measure this change using a hamming distance And I can show you the exploits right here Okay It looks like it looks it's very simple basically I just upload the dot wave file At least the interface is very simple. It's PHP file. I upload the dot wave file produced from the from the image It'll take a while This is not on a different system on my on my personal machine I have like a dual core 2.4 gigahertz. It'll take around about 14 seconds or so to to break it The server that I'm using right now isn't isn't as beefy But here oh, but out of order. This is the capture using on PayPal sign-up site right now What are you doing? Why why are you doing that PayPal like? It is disconcerting to me that a financial Institution would be using such a weak capture I And then also PHP nu Dot org here's a PHP nuke.org it has been downloaded by It says 8.35 million downloads and Look, and if you see the the capture in the corner of security code They're still using my and this was this exploit code has been available since April yet. They have not patched. I I'm not sure I'm not sure I can't explain that one anyway Okay, so in this tack A total of a total of five thousand five hundred and twenty seven hamming distances were used in this attack And that here I'll pop the code. I don't know how many of you are programmers, but I Like the code in essence I If you see the code I Start by loading all of the all of the clean letters into the into into memory And then I began In this one here, I know that I'm searching for five letters and that's what I'm saying right here I'm looking for after finding five characters. I would like to I'd like to stop I know I've reached the end Here I'm performing on this one over here. I'm performing the hamming distance I found a way to save resources by actually not performing a hamming distance on the entire letter, but only the first first three characters in this case the first five twelve bytes of the of the file Here I mean distance is actually a fairly simple calculation Here I'll show you this is a hamming distance shown on Just regular words for instance the hamming distance between toned and roses is three where three characters are different And then again the hamming distance shown between two sequences of binary So how the capture oh, this is Another thing if you're if you were to use a similar attack on another on another capture, you must make sure That you're comparing the the raw pulse code modulation What this means is that I The capture they were giving to me as a wave, so it is a raw stream of A digital representation of this analog of this analog wave If you have taken calculus this looks very similar to a an integral Now So okay, so if if you were to give it get a challenge response in an mp3 for instance You'd have to change it to a dot wave before before performing it. Anyway, my time is done. I Wish the best I hope I hope that there will be more research in the area of captures and people will be open with their research Thank you very much