Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sure, but running file-by-file is how bsdiff was designed to operate. (I wrote it for FreeBSD Update, which operates on a file-by-file basis, because I was paranoid about the problem I'd seen on Windows systems where an update was registered as "installed" but the system had not in fact been properly updated.)


Ah, that's where it got its name :)

Really awesome utility.


I believe it stands for Byte Subtraction Diff, or something like that. As I remember, something like Bentley McIlroy diff is used to identify identical runs, and where there are short runs of bytes separating two runs that are identical between the two files, the bytewise subtraction of the short run is stored.

The key observation is that when performing updates, large runs of machine code will be identical, with only relative offsets and absolute addresses changing. Code changes tend to shift huge pieces of binary code in parallel (by the same offset), so the bytewise subtraction expresses these shifts as many repeats of the same bytewise difference, which is then more easily compressed by zlib DEFLATE (or maybe another compressor... I forget the details).


These programs were originally named bdiff and bpatch, but the large number of other programs using those names lead to confusion; I'm not sure if the "bs" in refers to "binary software" (because bsdiff produces exceptionally small patches for executable files) or "bytewise subtraction" (which is the key to how well it performs). Feel free to offer other suggestions.


Ah, thanks for the information. I must admit I have never used it and never looked at what makes it different from other diff algorithms. Will definitely have a look at it !




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: