I mentioned in my last post on the topic that DiffDeluxe was designed to facilitate symbol porting, and to allow comparisons between executables that are "far away" from each other.
In the last post I wrote about Mozilla JS engine vs. Acrobat EScript.dll. Today I am going to try something slightly crazier: In order to evaluate how well these matching algorithms work, we will be diffing an executable that was compiled for ARM against a very similar executable compiled for x86.
My coworker Vincenzo is a big fan of all things OSX, and he brought up the idea of comparing x86 and ARM versions of the OSX dynamic loader -- namely the disassembly of dyld on the iphone against the disassembly of dyld on OSX.
Now, the first voices are going to yell: "You have names for all functions, BinDiffing is easy then!". Well, true, but we will run DiffDeluxe without taking the names into account, and then just using the names to validate the results.
The two executables have 704 (x86) and 618 (ARM) functions respectively. Without name
matching, we match 345 functions. Inspecting the symbols, we see that we have matched
160 of these functions in full accordance with the symbols. Let's have a look at some of the details:
Cute, eh ? Let's look at some more...It is almost surprising how far one can get without actually looking at the instruction semantics.
If we take the names into account, matching functions becomes easy, but matching basic blocks properly ends up the difficulty. With name matching enabled, DiffDeluxe matches 3809 basic blocks, out of 7904 respective 5196.
So to summarize: The structural comparison is sufficiently strong to yield some useful results even accross two different CPUs. While there is still (a good amount) of room for improvement, I am quite happy with these results so far :-)
So, if you want to beta, and you already use BinDiff, drop us a line !