Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Regarding #4: Padding out a file is popular because of tape drives. You'll typically write the archive to the an appropriately sized tape raw, without a filesystem (which is just another format you have to maintain a parser for). So because the file will be smaller than the tape, there will be garbage at the end of the tape. Because we wrote the file to the tape directly, there is no way of knowing where it ends, and a tool reading the file will need to just deal with the garbage data.


Doesn't one typically write to tape drives with tar, rather than writing a raw file?


Indeed. This is why the tar format explicitly allows garbage data at the end. So then people started pondering all of the nifty or clever things they could do with tar files. And they didn't want to give it up when they started compressing the tar files.


Couldn't you just reverse the order then, and create an xz.tar? Maybe I don't understand the benefit of taring the data first.


Tar is just bundling into a single file AFAIK. There is a slight benefit depending on your compression tool to tar and then compress, because (AFAIK again)some tools compress files individually and then write them into a hierarchical file(I guess this is what xz does as well, since it's searchable?). If you tar first, these tools will work better, since they encode patterns found in all files instead if doing it per file(which means e.g. if there is a header once per file, that will get compressed in the tar.comorpress, not in the . compress)


I've created sortedtar Brewster if this assumption and while it is correct tree benefit is mostly negligible except for some edge cases.


Old man moment!

Kids these days don't appreciate having random addressable storage for archive/backup data!


Garbage data is not a problem since the length is known.


I do not understand the xz format enough to evaluate that claim myself, but TFA explicitly claims that garbage data is a problem.


Hum, no. When you tar a set of files directly into a tape, you don't know the resulting tar size beforehand. Even less if you compress the result.


I would think you could just write the size at the end of the tape?


When you're reading the data later, how would you know where to find "the end" if you don't already know the length?


You don't know the end of the data, but presumably(?) you know the end of the tape.


We are talking past each other here...

Making a backup with tar is done by typing something like that on bash:

> tar -c - dir1 dir2 dir3 > /dev/tape

That will (hopefully, I doubt I got the tar switches right) backup those dirs into the tape (that will actually have a weird name, not '/dev/tape').

Now, in practice Linux doesn't always know the size of a tape you inserted. But this is not the issue, if you accept the seeks needed for that, you'd better write at the beginning anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: