 I'm Anshul, I'm the co-founder of Vita Glide, and I'll be talking to you guys today about reverse engineering Android applications. I'll be showing some tools that we've been using for our startup, and how you can use those tools to actually do a lot more stuff than what we are doing currently. So let me just give you an outline of the talk. I'll be speaking briefly about myself and my background. Then we'll be giving the motivation for the talk. That is basically why do we need to reverse engineer an Android application? What are the possible use cases? And I'll give an example of where we use it. After that, we'll jump right into the techniques, right? So first, you'll need some background on what is inside an APK, what are the various ways to actually reverse engineer Android applications. And of course, this sort of presentation won't be any good if I don't show you some demos. So I'll be showing a few techniques which I've become familiar with, and hopefully you can use it for fun and profit like the title says. Okay, so about me, I am the co-founder and the main tech guy at Vita Glide. So Vita Glide is basically a performance monitoring and analytics tool for Android applications. So the way it works is that we integrate our library into your APK. Then you distribute that APK to your Vita users, and when your Vita users actually use that application, we collect various data about the application and generate reports based on that, which you can see online from a Vita Glide account. So I'm currently in my final year, fifth year of computer science at IIT Kharagpur, and I've been casually hacking on Android for the last three years. It's mostly been serious work just for the last six months maybe. So yeah, motivation. So like I said, Vita Glide is a performance monitoring and analytics tool for Android applications. So what we need to do is we basically need to measure monitor interactions between the application and the device. So what do I mean by interactions? I mean basically something like I need to measure the CPU performance, how much memory is allocated, how much data is transferred over the network, how many times the garbage collector has been called, and not just track this data, we need to correlate it with what is happening inside an application. So if I just give you a graph of CPU usage with time, that's pretty much useless. Unless I tell you, okay, this is what was happening, then your CPU usage spikes. So you can, so without that correlation, you can't have, you can't analyze the data properly. So collection and correlation of data, that is basically what our tool does, performance data. So what we need for this is we need a reliable way to hook into an application and measure the metrics of interest. So this problem is similar to what various analytics companies do, right? And the way it currently works is that suppose you want to use Flurry's tool, right? So you have to spend some time and some effort into actually integrating that tool into your app. So now we are a really small startup team of around three people. So we thought that, you know, we have a new tool. It's doing something that other tools do not do. If we add a barrier of entry to it, I'm not sure how many people would actually even try it out. So our logic was that we'll give you a way to automatically integrate our tool with your app so that you can try it out. And if you like it, then use our library, spend some time, integrate it, and use it properly. So now that is one aspect. The other aspect is actually, how do I measure data from the Android application? So a concrete example of this might be if I want granular data about the network calls that are being made out of the application, right? I want data such as which URL was called, what was the latency, how much time did it take to transmit, how much data, everything, right? And I need to do this without having the developer actually type something, lines of code before every network calls. Because an app might have many network calls and it'll become really tedious for the developer. So that is where this reverse engineering aspect comes into. Basically what we do is, like I said, we automatically inject the monitoring code into the application. So how do we do that? This is the basic high level approach. So what we are basically doing is we are rewriting parts of a binary, which is an Android application in this case, right? And if you do it that way, then our solution is completely device agnostic, right? And it'll work on any application if you do it well enough, right? So and the best part is that we get code level context. If we are rewriting parts of the binary, we know, okay, this function is being called. It's sending this data. We can choose what data we want, right? And basically the possibilities are endless. We can do whatever we want with that data and we can collect whatever data we want. So we don't have to rely on what interfaces Android already exposes for monitoring applications. We don't have to rely on DDMS. We don't have to rely on instrumentation API and its restrictions. We can basically do whatever you want. Let's start the reverse engineering part. So if you just take any random APK and you unzip it, here's what you'll get. You'll get some non-code resources. So an important part of that is actually the manifest file and of course the resources, certificates, images, all those other stuff, right? And all the code of the Android application is actually in a file called classes.dex. So what is dex? Dex is Dalvik executable, right? So we are mostly concerned with this part, the classes.dex and maybe some Android, we might want to make some changes to the manifest. Depends on what you're doing, right? So okay, let's go into the classes.dex. So dex is basically a Dalvik executable format. So it's a binary format for Dalvik virtual machine which Android runs on and it's a little different from Java. So like, you know, I was quite familiar with Java. I thought that Java has such good tools for instrumentation. Maybe I can use them in Android. But the problem is that Android does not run on a pure Java virtual machine. It runs on a different kind of virtual machine. So I can't use typical Java approaches to solve this problem, right? And I'll just talk, give you a brief overview of the build process. So what happens is when you actually compile your Android application, your Java files are compiled into class files. And then there is a custom Android tool that takes all these class files and makes one classes.dex file out of it. And that is the binary. That is what is actually running, right? And even if you have any libraries included inside your application, everything, any binary code inside your application will go into this classes.dex into one file. Okay, right. Now let's talk about some reverse engineering tools. So the problem is we have classes.dex file which is a binary. Now what you want to do is we want to modify this binary. So we can't actually modify binary hex code. It'll be too difficult, right? So we need some sort of, so there are two options. Either you decompile it to a higher level language like Java, and then you modify it and recompile it. And the other option is modifying it at the byte code level. So the problem with the first approach is that when you are actually, so because this is not pure Java, this is Android, there's a lot of stuff that is going on in the background which you won't be able to capture if you actually, when you are actually recompiling your application. So suppose you recompiled your application, now you change some Java code and you're recompiling it. If you're doing this thing at the Java level, you don't have any tool to actually recompile it because it's very difficult. People do not know, people haven't actually figured out how Dalvik is exactly doing this. So that's where the problem comes in. So the solution to that problem is smally. So what is smally is basically it's a byte code format for the Dalvik virtual machine. It's basically like assembly for Android, right? So now what I'll do is I'll decompile my application into assembly, I'll modify the assembly, then I'll recompile it. So that's where smally comes in. So smally is the name of the format as well as the disassembler. And what do we use for all this compiling and decompiling instead of doing it manually, which can be done but it's a little tedious. So there's a cool tool called APK tool which in a nutshell just allows you to decompile and recompile Android APKs. And so since the smally code is actually an assembly level code, right? So if you want to reverse engineer, you'll have to understand that. And now understanding that is not trivial because it's sort of like assembly. So you need some way to correlate that smally with the Java code, right? And so that is where Dex2jar comes in. So Dex2jar converts a normal Android Dex file into a jar file. And there's a really cool tool called JDGY which allows you to decompile any jar file and it decompiles it into Java. So you can see the source code. So I think all these things and how they fit into each other will be clear when I show you a demo. Let's see the demo. Okay, so let's say I have this simple plain Android application, doesn't do anything. Now I build it. So in this folder, you can see there's a droid contest.apk, right? This is the apk file. So how does apk tool work? I'll just apk tool D. So D stands for decompile. And so when I decompile it, it creates a new folder with the same name as the smally. So there are two folders here, you can see. There's a manifest file and there's a resources folder and there's a smally folder. So all the resources and the resources folder, all the code in smally format is inside this smally folder. So you can see it follows a typical Java package folder kind of thing. So all libraries that were included were actually decompiled and stored in this folder. Okay, so let's look at the main activity smally, right? So this is the smally corresponding to our plain Android application. This is actually the smally of the main activity dot Java file, right? So like you can imagine, you'll get a separate smally file for every class that you have defined. That's the way it works. And you can see there are some methods. There is a non-create method. This is how it is represented. These locals are basically your local registers for that method. So the way method calling works in smally is that there are few reserve registers for sending parameters and getting the results. And in addition to that, you can use local registers to do your own thing. So this is like an example of a method call. So what he's doing here is he's calling the Android activity on create function, right? And he's passing in a bundle. So this part here, the one that I've highlighted, this tells, this is basically telling you what is the format of the function call, right? It's not passing any parameters. It's just saying that, okay, you have to call an activity on create function and you have to pass it a bundle, right? And these P0, P1 are actually the parameters passed. So this P0 corresponds to this activity, right? And this P1 corresponds to this bundle. So in the register P1, I have actually an instance of this bundle, right? And that is being passed in this message call. But this message call is also attached to an object instance. Which instance? That instance is stored in P0. So that's a simple example of how a method call works. We'll find this stuff useful a little later on, right? So let's, so as you can see, okay. So now let's, I'll show you a demo of Dexter Jar and JDGY. So in my folder over here, I have the APK of true color, right? Yeah, this is the APK. So now we'll try and see the code, okay? So this is the Dexter Jar script and I'll call it on true color APK, right? It already exists, right? So basically it will just create this jar file. I already have it, so I didn't recreate it. So this is the output jar file. Now if you want to view this jar file, you can just open it in JDGY, right? And here you can see all the code of that Android application. So as you can see, it is obfuscated. So they're probably using something like ProGuard to obfuscate the code before releasing it. So obfuscated code causes some problems, but you can still do a lot of stuff, like I'll show you shortly, right? So this is an overview of the approach. So as you can see, it's very simple to decompile and APK tool can also do recompiling of Smolly. So this is the approach. You use APK tool, decompile it into Smolly, then you modify that Smolly, then you use APK tool to rebuild it into an APK, and then use that APK. That APK is actually the modified application that you were looking for. So now what are the problems with this? The problem is that if you're, so if you're doing this by hand, it's pretty much simple to do. But like we were doing, you need, at some point of time, you will probably have to automate the whole code injection process, right? So you need an automated way to find the Smolly that you want to replace, depending on whatever you want to do. So I'll give a concrete example of this later, but. So the way, so you need to find the relevant Smolly. You need a non-hacky way to actually modify it. Now what do I mean by non-hacky? You can just do a simple find and replace, right? Find this function, replace it with that function. But the problem is that you don't know how the user is actually using it. Did he use three registers? Did he use four registers? Maybe the code that you want to eject will require one more register, and there's a limitation on the number of registers that you can have in a method. So if he's already using all the registers, you'll have to think about how to do your thing and still not modify what he is doing, right? Let's put it that way. And you need a generalized way to do it. So these are the problems with this approach. You can do cool stuff with it on a per application basis, you know, manually decompiling it, reading the Smolly, modifying it manually, and then compiling. That's not that difficult. But if you want to do this automatically, there are some issues. So you'll have to really think about how I can do this. How do I design my application so that it doesn't interfere, and it's easy to inject, right? It doesn't use many registers, for example. Okay, so let me give you an example of a real-world use case where we are actually using it. So among other things, our application also makes touch heat maps. So what does that mean? That suppose people are using your application, they'll give you a heat map that is a collection of all the touch events of all the users, across all the app sessions that they have created, right? So, and we need to monitor touch events in all activities. And we need to do this automatically. So we don't want the developer to actually write code inside his application. We just want that, if he gives us an APK, we do something with it, and we are able to try it. Right? So there's a very easy way to do this. In Android, there's a dispatch touch function. So that is associated with an activity, right? So what this function will do is it will catch all touch events inside that activity, right? And you have an option to either consume it or pass it on to the next event handler, right? So, naively what you need to do is you just need to write a dispatch touch event function inside an activity, right? But now you need to do this automatically. So, I'll show you an easy way to do this. Think about how, what problems will come and all that. I'm not going too much detail into what we are actually doing, right? So, okay, so this is the smaller corresponding to the dispatch touch function, dispatch touch event function. So what this function does is it's a standard Android API call. It takes a motion event parameter, right? And that motion event has basically the coordinates of where the user touched it, right? So what we're doing is that we are just calling our library function, right? Catch touch event. And after that, we're calling the next-level super event handler. So that whatever default behavior we're going to happen will now happen. But we have recorded this touch, right? So how do I insert this? Just copy it and paste it over here, right? So we have copied this function. Now, since I've modified the smally, I'll need to build it. So it's already built an APK file. We'll see here, it's created this folder. And now there's another dist folder because it's actually building the smally. So inside that, this folder, you'll find this is an APK that, when you run this APK, all the touch events inside the activities will be recorded by a library. So now you must be wondering, okay, that's fine, I called my library from inside the application. But how do I actually include the library, right? So including the library is really simple. What we have done is that we have a folder that contains the smally of our library, right? Our library decompiled into smally. It's this folder, b-flight-proto-folder, right? This has all the smally of all our library. And we just copy paste this into the relevant package folder. So since our package name is com.example.b-flight-proto, inside the smally folder, com.example.b-flight-proto-folder will put all the smally of our library. And that's it. We don't need to do anything else. Now, whenever we call any of these, we can import these anywhere inside the application we can call any of these functions from inside the application. There's no problem. Only thing you need to do is copy paste the smally, right? So for example, if you want to, so how do you use this? If you want to extend an application to do something else that you want, so create a library for it, you know, attach it with an APK, compile, create that APK, and then decompile that APK with APK tool, just pick out your library code and copy paste it to your target APK. And then when you build that, your library is automatically included. And for calling the library, you can use the technique that I just showed you. Just take the smally of the relevant method and copy paste it. So if you design it properly, it should work. It does work actually. And if you design it properly, that's all you need to do, right? And also it depends on your use case. So if you want, so this is a very simple example that I'm just adding a function to some activity. I may, the function might already be defined, right? The user might already have defined a dispatch touch event function. What we are doing in that case. So something similar, you'll need to find that function and add your line somewhere, right? But again, think about the potential problems that this can cause. And how you will do this in a non-hacky way. Okay, now I'll give another example of a cool use case for this. So you must have heard of the caller ID application called true caller, right? So I'll just tell you for people who don't know, what it does is when you install the application, it reads your contact list and send it to their server. Now what happens is that if you're using the application, any time you get a call from a person whose number is not saved in your contacts, they'll actually look it up in their database and give you the ID of that person, right? So it's a crowdsource kind of caller ID where you share your, everybody shares their contacts and so everybody has the contact of everybody else. So now the problem with this is that since it is picking data directly out of the contact list, right? Generally I don't know about you guys, but I'm not very careful while adding contacts. I mean I tend to give stupid names. So that may cause problem. I mean if some, my friend called some other friend who doesn't have his number. Now I have saved his number with a stupid name. That person in his caller ID will get that same stupid name, right? It's a kind of a breach of privacy. So I don't, I want the benefit, but I don't want the negative part. So I want the caller ID, but I don't want to share my contacts. And what I have is just the APK of true caller. So how will I do this? So it's not that difficult to think about it. We've already established how we can decompile an application, change it, recompile it and then use it. That's all you need to do on a high level. Now what you need to figure out is inside the application where is true caller and actually calling the contacts API and getting the contacts, right? And over there, instead of actually sending the contact name, you send some other language in it, right? The first part is identifying the part where the application does what you want to change. So let's, so for that JDGY is really useful. So like I said, so what I did over here was, so I have this true caller source code, right? Now since there is only one way to actually access the contacts in Android, so I search for contacts in the code, right? And I get this weird class L.class inside true caller.d, right? And over here I have an interesting looking log message, get all contacts error. So that means, that gives me an idea over here that user is probably trying to get the contacts from your contact list, right? And if you just look at this code over here, so it's actually fetching your display name and it's sending it to this variable, right? Now if you want to change this, we'll just set local seed out B to something else, right? Instead of whatever was fetched from the contact list, right? So now we have a broad idea of where you want to start. So inside test, inside the true caller smally, there is a D package. Inside that there's a class called L, right? And inside that there is some function called A, so these function names are not really useful, but this log message is really useful because it comes only once, right? So I can just search in the smally for this string and I'll look around and see where my target actually is. So let's look at the smally of this. So this is the L dot smally file. So as you can see this, so over here you have a, you're defining an array of string with the string's ID and display name, right? So actually if you go, if you come over here, okay. So what this is doing is it's sending a query to some cursor that is returning some object, right? And it's calling a getLong function on that, right? So, but the interesting part again, not the function, the interesting part is the strings, right? That's how you actually string's log message is that, that is how you try to actually find out where the action is actually happening. So look at this line, right? What it is doing is it is calling a function with the passing the parameter string display name. The result of that function is then sent to another function and that result is given to a variable called B, local C dot B. Now just local C as you can probably guess from the name is probably an instance of a class called C. So all these things, all these observations are really helpful when you're working with smally because it is an assembler format. So now correlated with this, right? I'm defining a string display name. I'm invoking a function, passing in the string, right? So as you can see, I stored the string in v2 and when I'm calling the function, I'm also passing this v2. That means that string is actually going as a parameter to that function, right? Move result v2. So whatever the result of that function was that is now stored in v2 and v2 is again passed to the second function, the getString function. Right? And here you can incidentally see the return type and all those things, some additional details about what these functions are doing. Now the result of this function is again moved to v2 and look at this line. So this is the crucial line. This is basically local C dot B equal to the result of whatever, right? So I defined a string, I passed it to a function, I got the result, I passed it to another function, I got that result and that I'm assigning to this instance, this thing over here, right? This variable I'm assigning this string, right? So if you don't want the contacts to come, just change this. Remove all this stuff, right? Remove it and then your v2 will actually contain a string, right? And that string is directly passed to your C, to your target variable, right? So every time, any time your true caller fetches the contacts, instead of getting the actual contact name, it will just get display name. So now you should probably modify this a bit. Instead of sending one string for all contacts, you should probably give different names to each unique string, right? That will require a little bit of further tinkering because probably true caller will be checking all these things on its server side. So you need to pool it as much as possible. So this is a very simple way to do something, but if you want a reliable way to do it, you have to actually think a lot more. Okay, so this is the approach, right? So details are left for you guys. So I've already figured out how to do this. And I've already told you the important parts, the background. So I would really like it if you guys should actually go and try this because unless you actually try it, you won't understand how it works, right? You won't understand what constraints are there. So go and try this. And if you find any other very interesting use cases for an approach like this, do contact me. I'm really interested in all this stuff. I'll be, that's my talk. Do you guys have any questions? So just to get context, I've made some similar stuff together. Not together, definitely, but so needing an automated way to find elements for me and to modify them. So I actually worked on this and the guide wrote this morning, he's the hacker who first, he's the first guy to hack and trade in customer ops and then Google actually ended up hiring him two years later. And so I was trying to do the same thing and reached out to the PLC quite a bit to get the power of open source. I asked him questions about his code and he gave me some more. So I've actually open sourced how to do AAMD. And if you search for your GitHub like what it was, there's a small library called computer. So smally means compile in Icelandic, back smally means recompile, and one parameter means transform. There's a library that actually goes to smally code, find a particular method, write down a search for HTTP method, and then modify it. If you want to take that code, you can use that as an example of your own. Any other questions? Hi, what measures would you advise app developers to take to protect their code from reverse engineering, apart from ProGuard and make sure no logging? So yeah, from one thing you could do is probably remove all logs before you actually deploy your APK. Because logs, yeah, but it's a two-sided sword. If you remove all logs, if something goes wrong, you won't know. Right, but there's probably nothing you can do, I mean. I haven't tried it, but it should work, because ultimately things are working on Dalvik VM. And I guess that ultimately what you will get is ultimately, even with NDKU, what you will get is a classes.dex file, right? Which has everything. Okay, so yeah, I haven't really tried it, so I can't comment, but you should try it. So I mean, this is a really generic approach, I mean. It's nothing wrong with Java that allows me to do this. I can do this with anything, as long as I have an assembly format for it. Right, right, so yeah, that is an important point that I forgot to mention. So you can make this APK, but for running it on any phone, you'll actually have to sign it with some certificate. So like if you're just testing it on your mobile, use a dev certificate or something. Don't distribute apps with, yeah, that's a good point. Yeah, any other questions? Are there any similar tools to reverse engineer iOS apps? I have no idea about that, but I'm sure there must be. I mean, like I said, this will work with, should work with almost any language. I mean, Smolly and APK tools are tools that are written for Android format, right? There must be similar tools. I mean, there's probably more iOS hackers than Android hackers, like not now, maybe three, four years back. So they must have cracked it by now, definitely, right? So guys, that's it. I'll be around. So if you want to ask any other questions, you must welcome. Thanks.