 I'm sure many of you have chip cards in your pocket, which you use for credit and debit card transactions. These use the EMV standard for communication between the card and the terminal while performing one of these transactions. EMV is named after the original creators of the standard, Europay, Mastercard and Visa, but since then, maintenance of the standards being taken over by EMVCo, who also manage closely related specifications on contactless payments, and also some other payment standards, like the 3D Secure online payment standard. I've been looking into EMV for about 15 years now. I've found a variety of vulnerabilities, both in the standard and in the implementations, but in this talk I'm not going to be covering these vulnerabilities. If you're interested, I've got other talks and papers about these. Instead in this talk, I'll be looking at some of the low-level details about how EMV works. I hope this might trigger your interest in looking into EMV in more detail yourself, and also perhaps will be useful as part of your work. Given the billions of pounds and dollars that flow through EMV transactions, you'd think that EMV was designed to be a security specification, but it's not really. Primarily, EMV is designed to be a compatibility specification. It's there to allow cars and terminals to communicate, regardless of how old or new the equipment is, regardless of where in the world these are, and regardless of the banks that have been used to set up the terminal and the cars. This is because when people perform EMV transactions, they want to buy something. They are not primarily interested in security, and if something went wrong, they would rather the transaction went through than not. Most of the EMV specification is about ensuring compatibility, while also managing the constraints. For example, cars are very limited in the amount of RAM they have and the amount of processing they have, and this has a lot of impact in how the specification is designed. Another consequence of this desire for compatibility is that cars and terminals try to follow Pistale's law, so they are conservative in what they generate, but liberal in what they expect. The idea is that if there are some changes to the way that cars and terminals work, they'll continue to operate because they will not just reject a transaction because something is unexpected. This is very desirable for compatibility, but has the consequence of creating security risks. Firstly, if there is data which is not possible to interpret, devices will ignore it, but maybe that is very security-relevant information that they're ignoring. Secondly, having this backwards and forwards compatibility creates more complexity, and from complexity we can get security problems. One way the EMV standard tries to support compatibility is the TLV standard for encoding data. In this standard data is encoded as a tag followed by a length followed by a value. This is a very efficient way of representing hierarchical DISA structures. If this was a more modern standard, you might see things like JSON or XML. In the case of JSON and XML, they take up a lot of extra space to encode data. In contrast, TLV is much, much more efficient. The downside of TLV is that while with JSON or XML, you can look at data and more or less work out what's going on. In the case of TLV data, you have to look quite carefully before you can work out what this actually means. TLV data helps compatibility because it's possible to decode without the decoder understanding all the semantics of the data that's encoded inside. A decoder can still produce this tree and identify data items that it knows about, but it can ignore data items that it doesn't know about. It also had some other features like allowing data items to be deleted just by filling with zeros rather than having to rewrite the whole string of bytes. This is convenient for older memory technologies that allowed erasure but not moving around the data. The TLV standard is also known as ASN1 BER. This is from the X208 standard. It's the same standard that is also used as the encoding format for HTTPS certificates. There's been plenty of security vulnerabilities as a result in the HTTPS standards. In this talk, I'm going to show how you can manually decode TLV data. Of course, there's decoders out there. I've written some and I'll talk about those at the end of the talk, but there's sometimes cases where it's helpful to be able to decode these things manually. Maybe the data you've got is incomplete or corrupt but you still need to make sense of it. Maybe you have to explain exactly why a particular decoding is the right one. One of the things I do from time to time is expert witness in a court case and there you not only have to give the answer but also explain the rationale of why you're making the recommendation that you are making. You might have to write your own decoder although I wouldn't recommend it. There's lots of places you can go wrong and also sometimes doing things yourself shows you where other people might have slipped up and help you identify where else security vulnerabilities might be. In this talk, I'm going to show these decoding processes and if you want to follow along, then these links here will take you to the Jupyter notebook which I've used for actually doing the decoding document with my notes to show some of the resources that are necessary to understand TLV data in EMV. And there's also the repository which contains the Python notebook and the source code for the utilities that I've been using. In this talk I'll be decoding some TLV data structures. These are represented in hexadecimal so to help us decode these I've written some simple Python functions that allow us to manipulate these strings of hexadecimal characters. These are part of the hexUtiles package. You can download it from the repository that I linked to earlier and you can also use this within the Jupyter notebook. The first function takes a byte represented as a pair of characters, hexadecimal characters, and then converts it to binary. It always converts it to eight bits so it'll handle adding reading zeros if necessary. Sometimes these strings will have spaces around it that's very nice for formatting but can complicate decoding so the function stripBytes removes any white space before, after and within the hexadecimal string. This one adds white space back in to make it easier to look at. It splits the hex string into bytes and then adds a space between each byte. Sometimes we want to count how long a hex string is so this counts how many bytes are in a hex string removing any white space beforehand. Sometimes the hex bytes include text, typically this would be encoded using ASCII so this function decodes a hex string into the equivalent string using ISO 8859-1 which is a superset of ASCII. Now into some slightly more interesting functions. Quite often we all need to look at the individual bits and at what position bits are set to one or zero so this function takes in a bytes and in the hexadecimal and then shows you which bits are set to one and which bits are set to zero. Sometimes there actually might be multiple fields within a single byte so this function takes a specification of how long each field is, converts the hex into binary and then shows each of these binary digits according to which field it belongs to. I'm going to be taking sections out of hex strings so the take function does this. In the first format it just takes a certain number of bytes so in this case you have a hex string and then you ask for two bytes and it gives you these two bytes and in the second form it takes two bytes but also takes an offset so here I'm starting offset one counting from zero that means I'm skipping the first byte and then taking the next two bytes. In this talk I'm going to show how to decode some real EMV data and I do some data of my own credit card now they'll get too excited the card's long since expired so you won't be seeing any sensitive information but this is a real card. To actually extract this data I used a tool called cardpeak this is really handy it can deal with many different types of smart cards it can do TLV decoding and other types of decoding by itself but the whole point of this talk is to show you how to do this yourself so rather than looking at the final output of cardpeak I'm going to look at the log file from this. One of the stages of getting data off a card is to use the read record command to see how this works let's look at the EMV specification rather than giving the whole spec I've taken some selected parts and these are available in the notes so here is the specification for the read record command. EMV commands take two different bytes to specify the command there's the class and the instruction so for read record class is 0 0 instruction is b2 and then it takes two parameters p1 and p2 these parameters depend on the actual command so if we look up the specification for read record we'll see that p1 is the record number we're going to read and then p2 is a reference control parameter and I'll talk about that a bit later. We need to specify how long the record is we need to specify the length expected but initially we don't know what this is so the way to deal with this is you initially specify zero as the length and then the card will return an error message so 6c is an error message and tell you how long the response is actually and then you call the same command again now specifying the correct length and it will return 9000 which means that everything is okay and then the 97 in the next decimal bytes that you've requested so I mentioned that there's a reference control parameter this is our first example of one of these fields in where you are taking a single byte and splitting up so here is the table that shows how one of these bytes is formatted the first five bits is the SFI the short file identifier we found out the short file identifier earlier on when we activated the payment application on this card and we found out that the relevant records are in SFI 2 so the first five bits get set to 2 and then the last three bits get set to 1 0 0 to indicate that p1 is a record number so we can show how this works by calling the format bytes function with a field settings of five followed by three and we see indeed yes the first field is set to two and the second field is set to 1 0 0 so I mentioned that this is a 97 byte long string that's in hex decimal if we convert that to decimal we get 151 so the response for record number two is 151 bytes we're going to now try decoding this and the first step is by putting this into a python variable we're going to call this response and we'll see this used quite a lot in the rest of the talk so let's look at the first byte of this response we know that this is a TLV string first byte of a TLV string is the tag so let's look at what that actually is and if we look at that we will see that it is 70 so now let's go and try to decode this tag 70 tags can be multiple bytes long so here is the table for decoding the first byte of a tag we can see that there are three different fields one that's two bits long one that's one bit long and then the rest is five bits long if we now decode this hex 70 following the field specification we see that the first field is zero one and that corresponds to application class tags can be universal which means that every application that handles as in one should be able to understand it and it's not one of those there's application where it is specific to a particular application so it could be smart card for example but it has the same meaning regardless of where it's used it can be context specific which means that the meaning actually depends where this tag is present or it can be private in which case the specification doesn't deal with it it's up to the producer of the card to define how this actually has meaning anyway this is zero one which is application class which means it's specific to to smart cards now we need to find out what is the contents of this tag this could be either primitive or it could be constructed primitive means that no further decoding is handled as part of TLV and constructed means that the contents is a series of TLV data items here we can see that this field is set to one which means it is a constructed data object that means that the contents are going to be more TLV data items and then the last five bits is just a number that encodes which particular tag that is this is one zero zero zero zero as long as this is not all ones this is a one byte tag so what we know is that tag 70 is a one byte tag containing TLV data items and if we look this up in the UV specification we'd find out that this is a read record response message template which is what we'd expect because we just called read record okay so now continuing on we've got the tag tag is 70 the next bytes are going to be the length so let's look at the length and this is hex 81 we actually have to decode the length to find out what it means so let's look at the table for decoding the length and what this says is that if bit eight is not set then this is the length but if bit eight is set then this specifies the length of the length and that is going to follow so here we can see that bit eight is set and the rest of it is one which means that there's going to be an additional one byte that contains the actual length of this data item so let's look at the next byte and this is hex 94 and that is the actual length of the contents of this response converting that into decimal is 148 148 bytes so within tag 70 there are 148 bytes of further data and these are in the TLV format let's actually check whether this makes sense so let's take these 148 bytes we get to this by skipping the one byte tag and then the two bytes used for the length and if we check the size of the whole response and take off those three bytes that we skipped over then indeed there's 148 bytes there so this tag 70 contains everything that is part of the read record response wait we know that this is a constructed tag so the contents are going to be a TLV item or one or more TLV items so let's look at the first part of that so the first byte is 8c and to find out what that actually means as a tag we need to decode it so let's follow the same decoding pattern so we see the first field is context specific so the meaning of this depends where it's actually found this is as part of an EMV application so we have to use the EMV specification the next bit is that it is a primitive data object so the ASM1 specification doesn't say anything more about how to decode that we need to look elsewhere and then the rest of it is not all ones so this is a one byte tag and the bits that are in this final field show us which of the tags it is so to find the meaning of this we're going to need to look again into the EMV specification it has something called a tag dictionary where you can see a list of tags and their meaning and their format so this is 8c and then we can see that 8c is the card risk management data object list one i'll go into what data object lists are that's another important data format within EMV but let's look at the contents next and to do that we need to see the length the length is 21 in hex now is this a one byte length or a multi byte length well we need to decode it bit 8 is 0 which means that this is a one byte length so the size of this seed all one record is 21 in hex bytes and then if we do decode that we get 33 so there are going to be 33 bytes following tag 8c which is the seed all one so let's get that we've got to skip over a few things we've got to skip over tag 70 the two byte length skip over tag 8c and then one byte length and take 33 bytes and then this is what we actually get so we'll save this save the seed all one for later but now let's continue and see what is the next item within this response so we're going to skip over everything we did before and also skip over the 33 bytes of data which is a seed all one and then the next thing we get is another tag so this is 8d and then this is a one byte tag which is the seed all two then we look at the length this is 0c that's a one byte length and then actually now let's get the contents of this tag so this is um 0c bytes starting at the offset by skipping over everything before and then this is now the seed all two so we've got both the seed all one and seed all two which followed it so I mentioned that seed all one is a doll object dolls are really quite important for emv they come about because emv assumes that cards are very limited in what they can do maybe they cannot even decode tlv data themselves the terminal on the other hand is more powerful so it can do a little bit more than the card so when data is sent from the card to the terminal that's often in tlv format but when data is sent from the terminal to the card generally tlv is not used because maybe the card doesn't understand how to decode that sort of data item instead we use dolls so data object list is a set of tags and lengths and that tells the terminal how to format data that the card is going to receive and then when the card receives this all it needs to do is jump to specific offsets within this set of data to know exactly what is the data item at that particular position without having to go through any more decoding steps so one of these dolls consists of a list of tags and a list of lengths and that tells the terminal when it's sending data to the card in this particular context all it should do is take this particular data item that the card requests pad it to a particular size and then send it to the card and then keep on sending these data items of the particular length until all the tags covered in the doll have been sent over to the card so the cdoll one is used as the first step of the authorization process where the card is asked to produce an authorization request if the card agrees to authorize that transaction it gets sent to the bank normally and then the second stage of actually telling the card that the transaction has succeeded is what the cdoll two is used for the data items that are requested in each of these requests are a little bit different that's why there's a cdoll one and the separate cdoll two so now let's go actually into this cdoll one it's a list of tags so let's look at the first byte of the cdoll one and we get 9f to find out what this actually means we want you to look at the table for decoding the first bytes of tags here we are and now we can see that the first field is one zero so it's context specific this is a emv tag the next bit is that it is a primitive data object so it's not tlv itself and then the rest of the bits are one one one one one we haven't seen this before this means that 9f is the beginning of a multi byte tag it's not a one byte tag like we've seen before so to find out what else is going on in this tag we need to look at the next byte and we get this as zero two to interpret this we need a different table so let's go on to the second table this says that if the first bit is zero then that is a last byte of the tag if it's one there's going to be more coming so here we see that the first bit is zero so this is a two byte tag one and two byte tags are quite common there's also some three byte tags particularly with the contact list specifications in principle you can have tags as long as you might like but i've never seen one that is longer than three bytes after the tag is the length and this is going to be six so if we now look at the emv specification which shows us what the meaning of each tag is then we can see that 9f zero two is the amount authorized in numeric format so the card wants to know when it's authorizing a transaction what is the amount is going to authorize so it's quite understandable and the card knows that this is the first field that's going to be sent it knows that it's exactly six bytes long and it can then very easily compare this value to any limits that are specified in the card as to whether to authorize a transaction or not next let's go back to the response at position 78 there's another tag this is two bytes long and it is 5f20 if we go and look what this actually means in the emv specification we can see that this is the card holder name and it is in alphanumeric format so after the tag is going to be the length and this is 13 in hex which is 19 in decimal if we now take these 19 bytes we get this and we can then decode it as asking and we get the card holder name card order name is also present on the magnetic stripe and i'll talk about the relationship between the chip data and the magnetic stripe data a bit later okay let's keep on going next at offset 100 we have another tag this is 5f30 and this is the service code the service code is another part of the emv chip data which was originally from the magnetic stripe standard let's go and have a look at what this says after the tag is the length this is two and after the length is the two bytes that make up the content from the specification we know that this is binary coded decimal we know it is three digits long and with binary coded decimal it's left padded with zero so we ignore the first zero and we get 201 let's look at what this actually means the first digit is two and this says that this is suitable for international transactions and it has a chip this service code on the magnetic stripe is how terminals know that this card should have a chip if you swipe the magnetic stripe that's why if you try that the terminal will say please use chip next digit is zero which says that it can be authorized in normal ways and the final digit is one which says that it can be used with or without pin and it can also be used for any type of transaction next let's look at offset 57 where we've got actually tag 57 by coincidence this is the track two equivalent data this is a copy of the track two of the magnetic stripe on the back of the card it's there to allow the terminal to process a chip transaction as if it is mag stripe maybe it's because the network isn't able to process chip transaction or maybe the issuer has not yet upgraded their systems so the terminal has all the information necessary to put together a copy of track one and track two of the magnetic stripe for track one the terminal needs to take these items like the account number and the card or name and so on and then put that together for track two there's actually a complete copy of track two in one of these data fields and and that's what you're seeing here so we can look what this actually looks like the start sent knows missed out but then the next is the account number 16 binary coded decimal digits here i've masked out the middle of the digits because i've had too many conversations with my bank to say i needed a new card because my card number has been shown on television even though this card actually was canceled quite a while ago then the separator normally this would be equal on track two but you can't do equal in binary coded decimal so they use d next is the expiry date so this is December 2015 so 1512 then there's a service code which we've seen before 201 and finally there's the discretionary data this varies between issuer but most of them follow a similar process so we can guess what it actually means the first five digits is the pin verification value this is used for verifying whether a pin is correct without the card actually talking to the bank which knows the pin this made sense when a lot of ATMs were not connected to any network nowadays it's obviously linked and it's probably not used but it's still there the first digit one which is the index for the key that is used for encrypting the pvv and then the rest which is 4079 is the pvv itself after the pvv is the cvv card verification value and this is 927 this was a security feature introduced on magnetic stripe cards to help the bank detect whether a card is fake or not a type of fraud that was happening quite frequently is people were getting card numbers and expiry dates and creating cloned magnetic stripe cards the way that this was fixed is the cvv was written onto the magnetic stripe and it allowed the bank to see whether this is a genuine card or whether it was a card created with a copy of the account number and the expiry date that was obtained from a receipt perhaps now this actually creates a problem when chip transactions came along because if there was a copy of the cvv on the chip it meant someone who had data from the chip could then create a mag stripe and then completely bypass all the extra security features of the chip transaction eventually this got fixed around about in 2008 where there was a different cvv that was copied onto the chip compared to the copy of the cvv which was on the mag stripe and that meant that in principle card issuers could tell the difference between a genuine mag stripe transaction and a mag stripe transaction that was made using chip data now it turned out that that didn't actually work out as well as hoped because what we've found out from some recent research from leon galloway is that issuers are accepting transactions as mag stripe transactions even though the cvv is incorrect because the cvv was taken from the chip and not actually from the mag stripe so i've tried to give you an overview of some of the issues involved in decoding TLV you can see it's not easy to get right there's different encodings used for different purposes in different contexts and even if you're using the right decoding format then there's lots of ways that things can go wrong for example we've seen that tags can be almost indefinite lengths it's very easy to encode extremely long lengths because you can say that i'm now going to send eight further bytes of length and then have massive lengths that are then processed by the TLV decoder and that could cause memory overflow or memory underflow errors because of all the ways that things go wrong mistakes do happen and this not only has problems from mistakes but problems occur from the consequences of those mistakes what i mean is that issuers who are banks sending out cards know that sometimes things will go wrong because of all these complexities of emv and they will accept transactions even though something has gone wrong now more often than not that's not fraud it's just someone has made a mistake with encoding or decoding some data but sometimes it will be fraud and this complexity leading to mistakes and then leading to being very forgiving about errors actually makes the emv system more vulnerable to fraud than it would be otherwise so maybe you'd like to try to do this yourself it's quite an interesting experience there might be some specific need why you might have to do this but it's often very instructive to try to do something yourself even though you can get some code out there already um one of my colleagues Mike Bond said the only way to understand the wheel is to reinvent it and when i've been writing to the decoders and other aspects of the emv infrastructure that's helped me understand where i've slipped up and if i've slipped up then there's a reasonable chance that someone else might slip up and that's been a very fruitful source of vulnerabilities in terms of my research if you don't want to do all the decoding yourself i've written a tov decoder on the emvlab.org website where you can try this out here is the decoding of the data i've been talking about in this talk so you can see that the cdoll1 and cdoll2 are there first there's some data i didn't talk about in this talk the application version number and the currency code currency code exponent and the ddoll and the icc public key but you can also see the track two equivalent data is there service code the track one discretionary data which is necessary for formatting track one if you're going to make my extractions action using the chip data so that stuff is all there this is just one of several records that are stored by the card there's a lot more out there if you want to try for yourself then get a smart card reader try out cardpeak and then see what you can find so there's more information about me and my research on my website and if you're interested in updates from our research group then have a look at our blog on Bentham's Gates