Thursday, January 18, 2007

One of the most amusing new features of BinNavi in the v1.2 release is the GDB agent. FX (of SABRE Labs fame) worked hard to create a proxy that sits in-between BinNavi GUI and something speaking GDB serial protocol either via a serial line or via TCP.

Now, what is this good for ?

First of all, it allows one to use BinNavi's debugging capabilities on platforms that we do not explicitly support (if a recent GDB version works on it). This means most *NIX variants. Let's say, for some reason, you have a FreeBSD system on which you'd like to debug some piece of software, and BinNavi does not come with a FreeBSD debugger. But GDB runs on FreeBSD - so you just run your target under gdbserver and use the BinNavi GDB agent via TCP to transparently debug the target.

Now, using BinNavi on more-or-less arbitrary *NIX systems is nice, but the real joy lies elsewhere: FX made sure that the debugging proxy does not only speak the GDB protocol as spoken by GDB itself, but also the variants spoken by Cisco IOS and ScreenOS.

This makes reverse engineering embedded systems that speak either regular GDB protocol or one of the supported variants a blast: In the past, we had to proceed as follows:
  1. Get a ROM image from somewhere
  2. Stare at the image to figure out methods to decompress it properly
  3. Once this was achieved, load the image into IDA and use switch()-constructs to determine the proper loading address of the image
  4. Load the image into IDA again, this time at the correct address
Of course, live-debugging was usually out of the question.
With the BinNavi GDB Agent, we can now do the following:
  1. Attach the device to a serial port and set it into GDB mode
  2. Read & dump the memory from the current instruction pointer backwards until the device freezes
  3. Read & dump the memory forwards from the current instruction pointer until the device freezes
  4. Load the result into IDA and export the disassembly into BinNavi
  5. Do live-debugging on the device in question :-)
So, as an exercise, we took a Netscreen-VPN5 we had acquired via Ebay. Unfortunately, it did not come with a support contract, so we could not get software images to disassembly. So we set the device into GDB mode by typing "set gdb enable" in the console, and connected:

C:\BinNavi.v1.2\gdbagent>gdbcmd COM1,9600 NS5XT
Connected via \\.\COM1 (baud=9600 parity=N data=8 stop=1) to Netscreen 5XT Agent
/ PowerPC

[q] quit | [r] Registers | [c] Continue | [R] Reset | [b] Breakpoint
[s] step | [m] Read Memory | [D] Detach | [d] Dump Memory Range

Reading Registers ... done

GPR0 = 1
GPR1 = 350f958
GPR2 = aecce8
GPR3 = ffffffffffffffff

GPR4 = 2e
GPR5 = 0
GPR6 = 0

GPR7 = 0
GPR8 = d55e70
GPR9 = ae0000
GPR10 = d50000

GPR11 = d50000
GPR12 = 40000024
GPR13 = 0
GPR14 = 0
GPR15 = 0
GPR16 = 0

GPR17 = 40140130
GPR18 = 0
GPR19 = 186ac40

GPR20 = 0
GPR21 = 350ff78
GPR22 = 186ac4e
GPR23 = ffffffffffffffff

GPR24 = 0
GPR25 = 0

GPR26 = 0
GPR27 = 0
GPR28 = 186ac40
GPR29 = 0

GPR30 = 186a910
GPR31 = ae5684
PC = 6826c
MSR = 29230
CR = 40000028

LR = 67c10
CTR = 249b30
XER = 20000002

The program counter is set to 0x6826c, and thus we know: Some code is mapped at 0x6826c. It is a pretty safe bet that all code will be consecutive in memory, sow we will now dump the memory forwards and backwards from this address: We type "d" in the command line and enter the base address and the number of bytes (in hex) we want to dump:

Memory at: 68000
Size: 400000
Filename: 0x68000.0x400000.dmp

The agent now begins to read the memory off the device in chunks of 1024 bytes via 9600 baud serial port - so it is a good idea to go to lunch in the meantime. Once we're back from lunch, we reboot the NS5XT - it will have hung when it ran out of memory to dump. We set it back into debugging mode and dump the memory before offset 0x68000:

Memory at: 40000
Size: 28000

Filename: 0x40000.0x28000.dmp

We stitch the two files together end-to-end, load them into IDA and run a few small scripts to identify function entry points and do some minor fixing of the disassembly (principally switch statements, and some function naming), and export everything into the BinNavi database. We then open it as usual in BinNavi, open the callgraph and start browsing around.

On the left, we see a callgraph view of the device's IKE packet handlers (which we inferred from string references in the disassembly), plus the functions that are directly called by them.

Now, which of these functions would be executed when we run a round of ike-scan against the device ?

Clicking on the red button makes BinNavi talk to the BinNavi GDB agent to set one-time breakpoints on all functions in the graph on the left - due to the serial link, this is not blazingly fast, but after seconds, not minutes, we have breakpoints on all these functions. We then run ike-scan against the device, and click on "stop recording" again. The result is the list of functions from our graph that were executed - highlighted in the following pictures:

Clearly we can do the same on the function flowgraph level in, for example, the function labeled IKE_SA_Handler above. Generally, everything you can do with BinNavi on Win32 executables you can also do with BinNavi on the embedded device now: Record traces, set breakpoints, set Python callbacks on breakpoints, read memory, read registers etc. etc...

The following three screenshots show the function in question being debugged. The first screen shows the path that is executed on running an ike-scan against the device highlighted in red. The second screen shows BinNavi having suspended the execution on the basic block with the red/blue border (the blue border indicates a persistent breakpoint on the basic block, the red border indicates that execution is currently suspended on that block). The third screen just shows the registers and some memory of the device at this point in time.

So to sum things up: With the BinNavi GDB Agent, you can debug anything that speaks the GDB protocol more or less just as if it were a regular windows app (small caveat: You are speaking with most embedded devices via a serial port, oftentimes 9600 baud. You probably do not want to set 60.000 breakpoints at once - aside from the bandwidth consumption, it is common for the gdb server to handle only a limited number of breakpoints. In our tests, setting several hundreds was no problem). Extracting ROM images in a format that is easily disassembled is easy, and full on-device debugging helps a lot with all our favourite tasks:
  • understanding the code at hand
  • identifzing which functions are responsible for which features
  • hunting for security vulnerabilities
  • constructing input to reach vulnerable locations
Have a good week, I have some more reversing to do :)

Oh, and be sure to check out Ero Carrera's Blog - he will post about the SQL database format used by BinNavi at the end of next week, and show why it's useful and flexible.


dre said...

We stitch the two files together end-to-end, load them into IDA and run a few small scripts to identify function entry points and do some minor fixing of the disassembly (principally switch statements, and some function naming), and export everything into the BinNavi database

That's not very much function naming... although you do have a few in that first jpeg. I don't know anything about ScreenOS images, and my interest is mostly in IOS.

How does one get function names for IOS when the images never contain symbol tables (and unlike Microsoft, Cisco/Juniper don't make these available)? Michael Lynn had many names in his pulled presentation. Is this just something that FX and others leak occasionally and most reverse engineers just guess the rest? Is there a loader module for PowerPC or MIPS IOS images that I'm just not aware of?

halvar.flake said...

Many names are trivially retrieved from debug messages:

"process_sa_payload(): Entering" does give you a direct function name.

Also helpful are tables of char *, func * - a lot more common than one thinks.

The functions in the screens come from a 10-minute-round of identifying debug printing routines and a char*/func* table, and running a quick script to use those names.

As a loader, we used flat binary - we had a memory dump, not a real image.

red said...

Thank you very much for this information.