Wednesday, March 04, 2009

Diffing x86 vs ARM code

I posted a while ago about the new DiffDeluxe comparison engine, and that we'd release it in Q1 2009. Well, we're almost there, the engine is now in beta. If you are a BinDiff user and wish to give the new engine a try, send mail to :-)

I mentioned in my last post on the topic that DiffDeluxe was designed to facilitate symbol porting, and to allow comparisons between executables that are "far away" from each other.

In the last post I wrote about Mozilla JS engine vs. Acrobat EScript.dll. Today I am going to try something slightly crazier: In order to evaluate how well these matching algorithms work, we will be diffing an executable that was compiled for ARM against a very similar executable compiled for x86.

My coworker Vincenzo is a big fan of all things OSX, and he brought up the idea of comparing x86 and ARM versions of the OSX dynamic loader -- namely the disassembly of dyld on the iphone against the disassembly of dyld on OSX.

Now, the first voices are going to yell: "You have names for all functions, BinDiffing is easy then!". Well, true, but we will run DiffDeluxe without taking the names into account, and then just using the names to validate the results.

The two executables have 704 (x86) and 618 (ARM) functions respectively. Without name
matching, we match 345 functions. Inspecting the symbols, we see that we have matched
160 of these functions in full accordance with the symbols. Let's have a look at some of the details:
Cute, eh ? Let's look at some more...
It is almost surprising how far one can get without actually looking at the instruction semantics.

If we take the names into account, matching functions becomes easy, but matching basic blocks properly ends up the difficulty. With name matching enabled, DiffDeluxe matches 3809 basic blocks, out of 7904 respective 5196.

So to summarize: The structural comparison is sufficiently strong to yield some useful results even accross two different CPUs. While there is still (a good amount) of room for improvement, I am quite happy with these results so far :-)

So, if you want to beta, and you already use BinDiff, drop us a line !


Nico Waisman said...

Quite impressive!
Nice work.

halvar.flake said...

Explanation of what I meant with symbol porting:

You have an executable without symbols. You know it contains statically linked code that is also present in another executable for which you do have symbols. It would be great if you could "port" this information into the executable for which you don't have symbols.

Example use case: You take apart a piece of software and you see that it uses OpenSSL's crypto functions. You compile OpenSSL with symbols, diff it against the executable that you have, and "pull" the symbols from OpenSSL into your current disassembly.

Nate McFeters said...


This is unbelievably cool. Kudos on the work, and now I'm saving up to buy BinDiff.


RPW said...

I have to attest that it works extremely well! I just BinDiffed iPhone binaries against OSX 10.5.6 binaries with extremely pleasant results ;)