Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree completely, in fact it seems odd that the article is really complaining about the format being more complicated than bzip2 or gzip, when the comparison is not really apples-to-apples. Comparing it with zip would seem more sensible to me.

A compressed container format that allows quick access to individual files is very useful and actually improves data security - corruption in one part of the data will be far less likely to ruin the entire collection of files, whereas corruption in a compressed tar file may lead to the loss of everything (I know recovery tools exist, but they cannot prevent one file in the container being dependent upon data from another file.)



You are confused. xz does not provide the ability to find individual files inside of the format: it is not at all comparable to zip files, and like the other formats mentioned it compresses one file. The "seekable" property is that it lets you somewhat efficiently decompress arbitrary byte ranges inside of the file, which is why a compressed disk image (as used by your parent commenter) benefits from this property but the industry-standard usage of compressing a tar file (which is a file format which inherently makes it impossible to find individual files without reading the whole thing) totally throws away this benefit.


A tar file doesn't have a central directory like a zip, so you do have to search all the file headers. However, each file header contains the length of the file it describes, which lets you seek past the content of any files you don't care about if the tar isn't stored in a way that would prevent seeking.


That would work if your tar was uncompressed. The lengths in the tar are meaningless in a compressed file.


That's not true, xz allows efficient decompression of arbitrary byte ranges inside the file.


Does it let you seek by a decompressed byte count, or or does it let you decode arbitrary compressed byte ranges?


Really? I thought that's what the page was telling me when it said .xz was a container format. In that case, its file format is extra odd!


Yeah: what they mean by "container format" here is similar to the usage in video compression file formats, where ".avi" doesn't imply any particular compression algorithms. An xz file is a container format which is designed for use to store another container format.


The article's point is, in part, that xz is more like zip or rar than gzip or bzip2, and use of it like those is incongruous (e.g. .tar.xz files). (Its other points are that xz has a number of systematic flaws, in its error detection and some "misfeatures".)


Ah - I presumed that the debian xz-based packages were using it as an archive, instead of the old 'ar'. But I just looked and you are right, the xz support is for the data.tar.xz that itself is still inside the 'ar' archive format. That is odd. I guess they chose this way for simplicity - far less code needs changing in all the .deb file processing utilities.


I can't speak for Debian, but I personally did not realize that xz was as I stated above before reading this article, so I wouldn't be surprised if some people simply didn't realize it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: