Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just a quick note about the disassembly challenge he faced (indirect references), having gone through this before: you can get amazingly good results by cheating a bit. That is to say, rather than assuming you actually have to properly execute through the code path, you can get very close by roughly tracking register assignments when making your initial pass through a block of code. (Even better, if you can track potential ranges of values with later calls into a given block. Some of this depends on how you've implemented your disassembler, though.)

I ended up doing this with a SuperH disassembler (with SH2, due to its two-byte opcode layout, indirect addressing is the order of the day), and by doing basic register assignment tracking and adding a few crude heuristics, I was able to get very usable results. No, the end result won't be "pretty"; you'll be moderately embarrassed to show it off., but it will work. :)

(Heuristics: one structure that I had to manually handle were compiler-generated jump tables; thankfully, for my project, I'd had a bit of help from the compiler that was used, and there were distinct signatures I could key off of.)

If you're even remotely interested in the disassembly aspect of this, I'd recommend learning a bit about a piece of software called IDA Pro: https://www.hex-rays.com/products/ida/ As horrible as the UI of it is, there is simply nothing better on the market for reverse engineering analysis.



Second this. There are a lot of "signatures" in most asm. Programmers for 6502 and derivatives might be a nasty bunch of sadists that love to do weird stuff to save cycles, but even there there are lots and lots of common patterns that often "happened" just because people learned from the same sources, or because it made sense, or because conventions appeared.

I never had a NES, but I had a C64, and the 6502 code wrote there seemed nasty to translate on the surface, with lots and lots of self-modification, for example. But in the end most of the self modification was specific looping patterns because the 6502 can only index 256 values, and so many loops involved writing addresses into the looping code, iterate 256 times, increase the most significant byte directly in the code and see if you'd reached the end, and jump back to iterate 256 times.

Most of this "nasty" stuff is relatively well known by now and much of it is relatively regular and easy to detect.


Constant propagation is not really cheating though: it's a completely safe and accurate optimization. We use that in the PPC->X86 Jit of the Dolphin Emulator to reduce register pressure and use the fact that X86 instructions can have 32 bit constants (while PPC is usually limited to 16 bit consts, and 32 bit values are loaded with 2 instructions: lis/ori). If you implement it properly, you can actually brag about it :) (we have an abstract object that can be either an X86 register or a constant value, and instruction handlers handle these two cases differently - when they can't, the constant is loaded to a register).

+1 for IDA Pro. It's a shame this software is so expensive. The UI is actually pretty decent when you get used to it, and there are a ton of good plugins.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: