 Hello everyone, thanks for coming. My name is Fernando Abaldi and I work as a security consultant for IOACTIP. I would like to show you today how XSL implementations are vulnerable to multiple practical attacks. And for that matter, the very first question that I would like to be addressed if I would be sitting over there is why are we talking about XSLT? This is a programming language that is not so common and has been created when XML was created and it was a way to parse XML data. So a couple of years ago, I came across Ariel Sanchez, a co-worker, founder of an XML vulnerability and an external entity expansion that allowed him to retrieve some passwords. And I thought this is pretty cool. I want to learn more about XML, schemas and XSLT. Those are all the technologies related to XML. And I was doing that, I noticed that there were no things breaching about how to exploit XSLT implementations. When you're reviewing a language, you may want to know that. So what we will be doing here is to analyze what are those weaknesses. So we're presenting five different issues in here and how you can practically explode them. So whether you're reviewing code, if you're a penetration tester or so, or if you're developing technologies related to XSLT, this will get handy. Or just trying to abuse any implementations, hopefully this will help. One of the things is that even today, no one of the vulnerabilities have been fixed, so everything will work. And you may be able to affect the confidentiality and the integrity of multiple implementations. And that means that you may get even some profit in certain scenarios. So the good thing is that you're not exploiting flaws in a way that a malicious virus of my world would do it in here. You will see no assembly code, you will see just XSLT and how that can be used to get some fun things. So we will briefly talk today about how you can identify your target, how numbers will let you affect the integrity, how random numbers may be predictable sometimes. I will show you how to bypass the same origin policy in a web browser using XSLT. And finally, some information disclosure through Evers. So basically the idea here is to tell you what XSLT does briefly and how XSLT can be attacked. And finally, if you don't know, hello. I was actually expecting this. I know. That's why I'm here. So this is your first time at DEF CON, huh? How's it going? A little bit anxious. You seem a little nervous. Do you want me to rub your shoulders? We have a medicine for that. It comes in that bottle, right? Yes, it does. I gave him the right one, right? Okay. No. That was like, dude, wait. Why am I getting the water? I don't know. I don't want the water. Moron. Over at DEF CON. Thank you very much. Yes. This will get interesting. You feel much better, don't you? I feel like a man, right? Thank you very much. I would expect that. And as I was saying, you can identify your target, maybe. Thank you. So basically, XSLT is a language that is being used to modify an XML. So what it does is receive as an input an XML document. It creates a text document or an HTML document or a new XML document for that matter. So there are different versions when it comes to XSLT there are three. V1, V2, V3. And the different versions doesn't mean that they are improvement, they should be, but they have more functionalities. Each version has more functionality. And V1 is the most implemented version in here because it has been supported by web browsers and because it is supported by previous versions. So an XSLT processor supporting V1, V2 will be supporting V1. So I tested two types of software, server-side processors and client-side processors. Server-side processors means those are standalone things that you can run in command line or they could be libraries that are hooked up to different languages, Python, Perl, Java, whatever. I want to come to client-side processors. Basically, I believe you have two types. You will have web browsers or eventually XML or XSLT editors. And I believe that that is a very narrow set of people that are using those. So the some of the processors in libraries are mostly three. These are the most important ones. Developed by Nomi, Apache, and Saxonica. Live XSLT is the most widely deployed one. It is not only implemented by server-side processors but also by client-side processors, web browsers. And you also have Salem developed by Apache people, which comes in two flavors, C++ and Java, and the similar thing goes for Saxon. And the client-side processors here, we have the browsers, all the things that I tested was in the latest versions available of all the server-side libraries and web browsers. So we have three ways to do this. The first one involves an XSLT processor receiving an XML and an XSLT. This normally happens when you're calling a common line processor. And eventually you will get a new document. You will do this if you need to parse an XML. So people will be using this if they need to parse something, server sign. Another possibility, which is more common from a client-side perspective, is when the XSLT processor is grabbing the XSLT document. So there is a small portion in the XML that says you will find here the XSLT document, go get it for me, and create the new result document. And finally you can embed the XSLT document along with the XML, and by doing that you just supply one file to the processor to get the new result. So you may want to know if you don't know already who is your target, which kind of properties do the target have. So by getting which type of version and vendor they have, you may know what type of vulnerabilities you could exploit in this target. Since clients may also support JavaScript, and that would be the case for a typical web browser, you may also retrieve some JavaScript information. All this code that you're seeing here and you will be seeing here, it is all in the white paper. You can do a copy paste and try on your target of choice to see what happens. So at the end of each section I will show you a brief summary of the server size. Here we have all of them. XSLT pro, that's the standalone version of live XSLT. And PHP, Python, Perl, Ruby are all related to live XSLT in this example. Then you have the client, which will be the web browser. You will see the first column for the version, the vendor over there. And if it supports JavaScript or not, basically, of course, all web browsers supports JavaScript. There is one final thing and that's that normally live XSLT is most widely deployed than other things. So you will notice that it is sometimes when everything is affecting server size, it may also affect the client. Let's start talking about the issues. So this is something present in client side and server side. And it doesn't matter if you're talking about floating point numbers or integers, all numbers will introduce errors in here. So as I was testing this, it felt a little bit weird that sometimes calculation was not working as I expected to be. Certain additions, subtractions were not doing what I was expecting. So the very first thing that I did was to define a style sheet as simple calculation. What I was trying to do was just to add a few numbers. So for that matter, I had defined a style sheet which has a specific output in here saying that it is a text output. And in the middle, you will have this simple thing, 0.2 plus 0.1 minus 0.3. That should be 0, right? Pretty simple. May not be that simple for processors. Only two say that this was 0. That was the case for Opera and Chrome. The rest said, well, close to that. Why is this happening? This is weird. And the weird thing is that you will see this across all implementations. Okay. This is cool. But it will be better if we do something with this. I mean, these are just numbers that were not properly rounded by the programming language. So this is the thing that is present in all programming languages. I realize that you will have these in JavaScript, Perl, Python, C, whatever. This is a common thing. Floating point numbers will have certain decimals that are over there hanging around that you may take. So I created a simple JavaScript application simulating a bank. This is not a real bank. I wouldn't try this on my real bank. Hopefully my real bank will limit the amount of transactions. I wouldn't allow a very small decimal to be transferred from one account to the other. So the very first thing that I tried was to see, I created this application. I deposited a million dollars on the first account. And the second account has a zero balance at the moment. This is where I will deposit my profit. So I noticed that if I remove a very small number from the million dollar account, it will not get subtracted. But it will be added to the secondary account because it has a lower number than a million. It has a zero. And that decimal means more for a zero than it gets for the million. So this program will try to do, the first portion of the program will try to see how big of a number can retrieve. It is a small number. And then it will do millions of transactions to move it to the secondary account. So you will see here that we will be seeing V8 and that's the Chrome JavaScript processor on the standalone version. And we will try to see what's the best profit to get in here. How much money can I steal from a million dollars that would not be noticed? And I will try to, I will start moving that money, where that money from the account number zero to account number one. And hopefully that will give me a daily profit of around $1,300. It is moving, right? Yeah. So this was good. But it would be better if it would be higher than number. So let's talk about integers. I would find an XMO with 10 numbers. And this would be fairly easy to understand even if you are not developing. You will see that you have five exponential numbers in here and the same five exponential numbers written just the number one with a bunch of zeros. The thing is that programming languages do not handle very well when you have more than 16 digits because of the precision. But what I tried to do was to print in here the same number that I was having on the XML document and then formatted with the commas and the periods and such. So it would be more legible. In here, you will see that Saxon is doing great. This is what you want to see. You will see number one followed by a bunch of zeros. This is pretty clear. This is awesome. You will have the same for the non-exponential notation. Internet Explorer and Firefox argued they weren't able to show the exponential notation, but that's okay. The problem comes when you're introducing errors because there's nothing worth believing that you have the right number when in fact you do not. In fact, I was noting today how they are finishing is different depending whether you're using exponential notation or not. So we will try to use this number in a couple of minutes. Say hello for Java. Almost there. And for C, they just don't care about what's going on over there. So anything can happen here. So first I thought, okay, this is something related to an error in the standard. So I went to read this standard 754. But the problem is not over there. All implementations have problems. So it is what you do with floating point numbers and integer, what matters. You should be saying, okay, a number should be between these values and not allowing a value to be so big if you are not able to handle it. Either way, this shouldn't be working like this. So I reported all the issues including this one and the floating point numbers to the vendors. And the first thing that I heard was that I should be reading Wikipedia to understand how floating point numbers work. That was interesting. But probably you wouldn't find the answer over there. Then I heard that I should be reading the accessibility to specification. But this was affecting purely V1. That was nice as well. But clearly it was not solving the problem that was in here. And the very same person also said that this is something that you will see in JavaScript as well. I mean, that's fine. I know that you can't find this in JavaScript. But I wouldn't like to have this in my programming language. On any programming language because these errors are everywhere. So we stole some decimals before. Now we're trying to do a similar thing but with integers. The thing is that if you put a number one followed by 17 zeros and you subtract the number one, programming language will not notice that the one is missing. So perhaps I created in here a fake cryptocurrency which I named fakecoin that the value is very small. Very, very small. So I bought a number one followed by 17 zeros of these coins. So I have a lot of coins with a total net value of $1,100,000. And I will try to transfer one coin at a time to a secondary account which will be my profit account. And hopefully by the end of the day I will have a better profit than moving decimals. The profit will be better if I use more coins. I will be able to transfer more coins at the same time. And here I'm just going for the minimum amount possible just to show you. The minimum amount in here would be, would give me a profit of $2,300. If you had a zero to the coins, you should add a zero to the daily profit as well. So that was nice. That was nice. So the very next thing that I did was to see how random numbers work. I mean if you are developing, if you have ever developed, you need random numbers. This is something that you should normally see it on server-side processors. And you should also know that, of course, not any random number generator should be used for cryptographic purposes. I mean, random numbers, you have to be careful with them. In XSLT, this is a function that comes from the extended XSLT, which is an extension of XSLT. It is defined as a function that returns a value between zero and one, as any random function should be. Supposedly any random number should be a number that doesn't have any pattern. I mean, you shouldn't know what would be the number before calling this method. It would be fairly logical from a random point of view. So we normally have two types of random functions you have ever developed. You may know that you have functions that are less secure, like random in Python, and you have more cryptographically secure mechanism like system random in Python. You may want to use that if you are moving cryptographic things. Some of the software that I tested, the server-side processors, you are able to see the code, you are able to see how that was developed. For XSLT, XSLT for C, XSLT for J, and you will see that in all these areas, they are using pseudo-random number generator. Which is fine. The thing comes, the problem may come on the implementation. If people are using random numbers for any cryptographic proposal, that may be a problem, because you may know with pseudo-random number generator what's going on. These were the functions of which I see, in C and C++ and in Java, and a good definition comes from the MAM base. These are just by random number generator. You have to take that into consideration and shouldn't use them for cryptographic proposals. But there's one more thing when it comes to random numbers that you normally pay attention to, or you should, at least. What happens if there is no initialization vector? This is something basic from any random number. And that's that you need to have some sort of something that is changing when you're getting a random number. Otherwise, you may get always the very same value. And that's not very useful if you're expecting a random number. And that's because you may know in advance which numbers you will be getting. So once you have a proper IV in place, you will have different values every time that you're calling the random functions. But let's see again how the functions that we saw before are working when it comes to the initialization vector. And here there is only one that doesn't have the IB. Again, live XSLT. This is not something new to live XSLT. They know about this since 2006. But this is how it works. So if you try to create an XSLT that will produce a random function, or you see anyone who's trying to produce a random value out of live XSLT, you will see something like this. And you will see these kind of results if you're executing that on common line. I executed twice on the same terminal. And I go twice the same number. You can see that the 7.82, you always get that first number every time that you're executing the random function from live XSLT. So the next thing that I do is to try to understand how this can be used in cipher modes when they're doing block ciphers. That's not a way to cipher things if you're using random. So I created two executions at the very first time to understand how these numbers look like. So first I printed the Python version of random, random. And you get two different numbers. Of course, this upsets the random number generator. They may not be the best, but they are not predictable. And they are not the same every time that I'm executing that function. But live XSLT, we can recognize again the very same number that we saw on the previous line, the 7.82 thing. That's the very same number. If you're calling Python again with a print random, random function, we will see that we have again two different numbers. So so far, four for Python and one for live XSLT. If we're calling a live XSLT again, you will notice that in the second position, we will always have this new value, this 0.13. And it will be repeated every second time that you're calling this. So without having an external seed value such as time, you may know in advance which will be the sequence of numbers that will be generated by live XSLT. Which is pretty cool because you may know in advance what is being encrypted if they are using this or encrypt something. Which will be fairly ridiculous. So again, you may predict values when you're seeing random numbers. The same origin point is something that is present in client-side processors. This means web browsers. Basically, this says that if you're on a website, you shouldn't be reading information from other websites. But again, as always, that may not be the case for certain engineers. So this is important. The origin is always defined by the scheme, the host on the port of a URL. We will be an example of this. The HTTP at the very beginning or HTTPS will be the scheme. The host should be example.com. And the port should be either port 80 or port 443 or something like that. So generally speaking, when we are retrieving documents from different origins, the web browsers will not share the information. I mean, when we are taking the same origin over and over, we may have, we will be sending the same cookie over and over to the same website. And that would be okay. Normally, JavaScript is used to try to alter this, but you don't necessarily need to use it to affect the same origin policy. You shouldn't be expecting that when you're connecting to Google.com, your browser will be sending the very same cookie to this website because it has the very same origin. If you're connecting to Microsoft.com, you should be seeing a different cookie. This will be a very valid scenario. You just connect to a website. You're on the main web page. You're trying to access a second web page that is being stored over there. And that will be fine. That's okay. You're allowed to see that. In fact, you're even to allow, you're even allowed to see other web page on the very same domain. But if you're changing the scheme, if you're changing the host name, or if you're changing the port, you shouldn't be allowed to see any of the information that is present on that other website. I mean, you're not sharing the private information between websites. That's what you would expect at least. So there's only one function that reads documents, and that's document. Okay. So you may try to use that to read another XML document. In fact, since we're speaking about websites, we could also see in here XHTML, which is a fairly common way to represent a web page by certain web servers. Once we retrieve the XHTML documents, we can see what's inside using either of these two functions, copy off and value off, which will show either an XML representation or an external representation. So the very first thing that you want to do, if you want to abuse this, you need to find a server that uses XHTML. Okay. Bing.com uses XHTML. I'm logging in here. What can you do with this? So on the upper right corner, you will see that my name, it is a red box, and that is also reflected in the code. And since it is XHTML, this is some sort of XML. And my name is within an element named ID underscore M. So you may be able to target your web browser to retrieve that value. So let's see how using the document, how using any other functions, we can retrieve that information. In here, we can see that the document function is accessing the URL, www.bing.com. And then right in the middle, we are retrieving the information that we just grabbed from the document. And finally, because I'm lazy, on JavaScript, I will be subtracting the ID underscore M element, which has my name. So let's see a demo of this. First, I will open Safari, and I will show you that I'm using Bing.com as my home page. And then I will open the document that is on the desktop that it is not sharing the same origin. Because one, Bing.com, it is being housed on HTTPS, Bing.com. And the other one, it is file, it is a local file. So let's see what happens. You will notice again, my number in the upper right corner, and when I open that file, I'm reading the document that is being stored by Bing.com. And I'm able to retrieve my name from using XSLT. Even though it is not hosted on Bing.com, Safari doesn't care. I will show you that information. So, basically, Safari will allow you to read this. Internet password may show your warning message. We will retrieve the information, but you won't be sharing anything related to this. And the other browsers just didn't do anything. Another cool thing would be that you may use some of these stuff to try to scan internal networks in case you wanted to. There are multiple ways to try to scan internal networks when you're executing something locally. And this could be another way as well. So another vulnerability that I found, and I thought that would be very interesting to discuss, would be an information disclosure on file reading through errors. This is something that is present in server-side and client-side processors. The focus in here is, of course, on server-side processors because we wouldn't care what will happen on a web browser. So the cool thing about this is that it is not possible to read text files in XSLTV1. It is only possible to read XML documents, or as we saw XHDML documents. And since it is not possible to read plain text files, it doesn't matter what function we are trying to use in here, because all functions shouldn't be capable of doing this. So let's see what happens, even though when the W3C consortium says it is not possible. So we saw before that there was one function to read XML documents, and that's the document function. This document will allow access to other XML documents other than the main document. Okay, we have that. We can try to use that. There are also other functions used for accessing XSLTV documents, and that would be the functions include and import. These functions do just retrieve a style sheet, and I would try to use it, combine with other style sheets. We don't care about what the manual says about this, because either way we are not trying to read a style sheet in here. So I created a text file that contains three lines, very simple. If you see the contents of my test file, you will see a line one, a line two, and a line three. Pretty simple. And if you read the documentations, you will see that when you're reading a file, this comes from the XML documentation. If you're reading a file, there are a couple of possibilities. The first one is that you may show that the XSLTV processor is found an error, and that's what some of the processors do. They say this is not allowed in Prolog. Okay, that would be, that's okay. The other possibility would be to return an empty XML document. That's what Ruby does. Ruby will show you that there's nothing to see in here, and this is something that is also expected as well. But again, this doesn't solve the problem that we want to read something that was on the test file. LibXSLT comes again to help us with this. So when you send a document, XSLT, PROC, PHP, and Perl, will show you the first line of our test file. Remember the line one of the test file? That's not too much, but it's cool. Perhaps we can do something with that. We also try to use other functions to try to access these files later. But having this unexpected behavior in place may allow us to do something with it. So you may know in advance where I'm going with this. Which type of files may have an interesting first line that would be valuable for us. There are certain specific files that sort the most valuable information of a computer on the very first line. So what if we will be able to read, for example, a password file where we could find those passwords. The most common answer for any Linux system would be an instant password. The next one, if you go for bit, they are running this as route, could be the instant shadow. The possibilities are in your imagination here. Depending what you are trying to read, you may be able to retrieve certain information that may be valuable for you or for someone else. You also have the Apache password and you also may have databases passwords. There are possibilities down there. The thing is that this is what you will be seeing when using, for example, one of the processors when trying to read the access to your password. You will see an error and also something else. The password. Which is cool. You can also now use access to retrieve this information. Another example you are seeing PHP could be to try to use PHP to read the HD password of an Apache. This is something you store in the very first line. You may see a bunch of errors and write in the middle what you are planning to see. The password for John. In this case. As I was saying before, just in case they do not care about what they are doing, you could also have someone using the et cetera shadow leaving that available if they are running this as root. This will happen if you are using Ruby to try to retrieve that file. Again, expect all the errors, but also the password for the root over there. This is pretty neat because I believe this opened the possibility for XSLT to be interesting perhaps as external entity expansions as a way to retrieve some information if an attacker is able to compromise an XSLT because the application is a low and there are applications that allow XSLT to be uploaded or XML that are reliant in document input that may be out there in any way to try to read files. Either if you are able to control an XML and you have an XSLT processor in the backend parsing this, or you are able to control the XSLT, you may compromise the security of an application. So I would recommend that in place to have the confidentiality and the integrity affected because sometimes when using random function or integers, they may be doing that to our profit without doing anything on our sign. So I would recommend as a very last thing that you should check your code or someone else in case just wants to see what's going on today, if anyone has any questions, so happy to answer them. Thank you very much and thank you for all these people who helped me with the presentation.