Page 1 of 1

Beta4 can not unpack *.gz

Posted: 16 Jan 2014, 16:49
by lexa
Hi Salamander Team,

Salamander 3 ist unable to unzip *.gz files (create on Linux "gzip file.txt"), but Total Commander can, see Screenie.
But Salamander 2.5 on the same machine can.

Machine: "Salamander 3 beta4 x86" on "Windows 8.1 x64"

Re: Beta4 can not unpack *.gz

Posted: 17 Jan 2014, 10:53
by Tomas Kopal
You really can't extract the archive, or just the displayed size is wrong?

Can you, please, share with us the archive which fails to open? Or create one for us, which would be also failing, but without any sensitive data, if that's the case with the original archive.

Re: Beta4 can not unpack *.gz

Posted: 17 Jan 2014, 20:06
by lexa
> You really can't extract the archive, or just the displayed size is wrong?

Only size is wrong, my fault, sorry.

You need an example file furthermore?

Re: Beta4 can not unpack *.gz

Posted: 20 Jan 2014, 12:18
by Tomas Kopal
lexa wrote:> You really can't extract the archive, or just the displayed size is wrong?
Only size is wrong, my fault, sorry.
No problem, thanks for getting back to us.
lexa wrote: You need an example file furthermore?
Well, according to your screenshots it seems that the older version did show the size. If it is the same archive, it's a regression, and then yes, I would still like to have sample archive to debug.
However, if it is a different archive, then it's most probably a "feature" of gzip format, where size information is optional and gzip archivers omit it very often, so we don't have anything to show in such case.

Re: Beta4 can not unpack *.gz

Posted: 21 Jan 2014, 18:46
by lexa
Hi Thomas,
If it is the same archive, it's a regression, and then yes, I would still like to have sample archive to debug
Yes, it's of course the same archive on all those screeshots, but for your tests I created a new (and anonymous) one for you with the same behaviour (TC 8 and Sal 2.5 shows the size, Sal 3b4 not).

Code: Select all

# dd if=/dev/zero of=testfile.log bs=512 count=20480
# gzip testfile.log
Systems: GNU gzip 1.3.12 (x86 and x64) on older Linux OpenSuSE 11.2 + Debian 6

Re: Beta4 can not unpack *.gz

Posted: 27 Jan 2014, 21:52
by Tomas Kopal
Thanks a lot for the samples. Unfortunately, after I have traced the code execution to gzip listing, I just found a comment in the code explaining why the size was intentionally removed.
The reason is that not all archives contains the size, and there is no reliable way to find out if the size is there or not. And if this is not enough, second problem is that the size is 32-bit only, and it wraps for bigger gzipped files. Again, no way to tell if the number is correct or wrapped.

So, thinking that it's better to show nothing than to show nonsense, we removed it. Sorry for the confusion...

Re: Beta4 can not unpack *.gz

Posted: 28 Jan 2014, 10:59
by SvA
Tomas Kopal wrote:So, thinking that it's better to show nothing than to show nonsense, we removed it. Sorry for the confusion...
Well, 0 is not "nothing", and as you have seen, causes confusion. Couldn't you display nothing (a blank space) or something to denote, that the size is unknown (a question mark for instance)?
Tomas Kopal wrote:The reason is that not all archives contains the size, and there is no reliable way to find out if the size is there or not. And if this is not enough, second problem is that the size is 32-bit only, and it wraps for bigger gzipped files. Again, no way to tell if the number is correct or wrapped.
RFC 1952 wrote:A compliant compressor must produce files with correct ID1, ID2, CM, CRC32, and ISIZE, but may set all the other fields in the fixed-length part of the header to default values (255 for OS, 0 for all others).
The standard requires the size as a mandatory component. So I feel you shouldn't cater for gzip files with missing size in such a way.

Except for pathological cases (such as lexa's test case), and huge gz files, you should be able to quite reliably estimate the number of 4 GB junks you need to add to the 32-bit ISIZE value.

Re: Beta4 can not unpack *.gz

Posted: 28 Jan 2014, 14:36
by Tomas Kopal
SvA wrote:
Tomas Kopal wrote:So, thinking that it's better to show nothing than to show nonsense, we removed it. Sorry for the confusion...
Well, 0 is not "nothing", and as you have seen, causes confusion. Couldn't you display nothing (a blank space) or something to denote, that the size is unknown (a question mark for instance)?
Yes, that would be the best way, I agree. Unfortunately, plugin API is a bit limited in this regard at this moment, and enhancing it to be able to display something else than numbers in this column is non-trivial, and low on the todo-list. So for now, it's zero. Considering the fact that this is the first (or maybe second?) such request in all those years gzip is supported in salamander, it's probably not the most important thing to do, sorry.
SvA wrote:
Tomas Kopal wrote:The reason is that not all archives contains the size, and there is no reliable way to find out if the size is there or not. And if this is not enough, second problem is that the size is 32-bit only, and it wraps for bigger gzipped files. Again, no way to tell if the number is correct or wrapped.
RFC 1952 wrote:A compliant compressor must produce files with correct ID1, ID2, CM, CRC32, and ISIZE, but may set all the other fields in the fixed-length part of the header to default values (255 for OS, 0 for all others).
The standard requires the size as a mandatory component. So I feel you shouldn't cater for gzip files with missing size in such a way.

Except for pathological cases (such as lexa's test case), and huge gz files, you should be able to quite reliably estimate the number of 4 GB junks you need to add to the 32-bit ISIZE value.
We have been there, we were displaying the size in prior versions. We have removed it because people were reporting wrong values. It's nice to have a standard, but it's useless if other programs do not follow it...
Unless someone comes with a reliable way of telling whether the size is correct, we won't be changing the implementation there and back, sorry.

Re: Beta4 can not unpack *.gz

Posted: 29 Jan 2014, 00:54
by lexa
Hi Tomas,

I agree: This is not so high on priority, but it looks ... mmh ... strange if such a very nice + powerful tool like Salamander is not (or no more) able to determine the size of gzipped files. That's an open (not proprietary) and well documented format for (de-)compressing files. And thinking the next, when Salamander is going to unpack, how will it calculate the available diskspace for warnings...?

In Salamanders default configuration, the following filetypes are associated to the one-for-all "TAR" plugin: "tgz;tbz;taz;tar;gz;bz;bz2;z;rpm;cpio;deb"
So I miss the "tar.gz". How is it handled? It looks like same as ".gz". But why different results? ...see 2.)

Maybe a separate g[un]zip Plugin for gzipped files is the solution?
Unless someone comes with a reliable way of telling whether the size is correct...
Why not, I got two of it ;)

1.) Study of the gzip sources may help you:

Code: Select all

me:/srv # gzip --list bigfile.txt.gz
         compressed        uncompressed  ratio uncompressed_name
          200774480          3217555456  93.8% bigfile.txt
2.) See my new screenshot below, how Salamander handles and shows *.tar.gz files in another way as *.gz. I think, there must be a way in your own code to do this.
Considering the fact that this is the first (or maybe second?) such request in all those years ...
There was no real reason to grumble in all those years. The gzip functionality was working almost very fine - for all users, but a few :wink:

So Tomas, I'll do my best so that should be my last try to confuse you with my .gz problem :mrgreen:
.

Re: Beta4 can not unpack *.gz

Posted: 30 Jan 2014, 17:02
by Tomas Kopal
lexa wrote:Hi Tomas,

I agree: This is not so high on priority, but it looks ... mmh ... strange if such a very nice + powerful tool like Salamander is not (or no more) able to determine the size of gzipped files. That's an open (not proprietary) and well documented format for (de-)compressing files. And thinking the next, when Salamander is going to unpack, how will it calculate the available diskspace for warnings...?
Yes, gzip is open and documented format (I would probably leave out the "well" word here, I had studied the documentation quite hard, but it could be worse :D). The trouble is that it's not one definitive format, it's several versions of such. And there are a lot of programs creating gzip archives, and not all of them adhere to the original format specification. So it's not that easy to support.
lexa wrote: In Salamanders default configuration, the following filetypes are associated to the one-for-all "TAR" plugin: "tgz;tbz;taz;tar;gz;bz;bz2;z;rpm;cpio;deb"
So I miss the "tar.gz". How is it handled? It looks like same as ".gz". But why different results? ...see 2.)

Maybe a separate g[un]zip Plugin for gzipped files is the solution?
No need to separate it.
tar.gz is completely different case. That's a tar archive compressed with gzip. The information Salamander displays are taken from the tar archive, not from gzip. The same holds for tar.bz, tar.Z etc.
lexa wrote: 1.) Study of the gzip sources may help you:

Code: Select all

me:/srv # gzip --list bigfile.txt.gz
         compressed        uncompressed  ratio uncompressed_name
          200774480          3217555456  93.8% bigfile.txt
Not sure how this is handled, maybe gzip format got another format variant :?. The format description included in gzip sources clearly states that at the end of the file are:
4 bytes uncompressed input size modulo 2^32
That's the only place mentioning the size. And from our experience, even this information is sometimes not present. And I am not even starting about gzip with multiple streams. To get a slightly better idea of the problem, try googling a bit, you may find e.g. http://stackoverflow.com/questions/9715 ... -gzip-file. In short, the only reliable way to find out the size of the uncompressed file is to decompress it. We were considering this option, but we decided that the time it takes to decompress the file is not worth the information (although e.g. for the tar.gz format you mentioned it is done, as there is no other way to list the contents).

So, although I agree that the current behavior is not optimal, we think it is still the best. The only improvement we may do in the future is to remove the zero or remove the size column altogether. Sorry...
lexa wrote: 2.) See my new screenshot below, how Salamander handles and shows *.tar.gz files in another way as *.gz. I think, there must be a way in your own code to do this.
As I mentioned above, tar.gz is taking the information from different source, it's basically completely different archive.
lexa wrote: So Tomas, I'll do my best so that should be my last try to confuse you with my .gz problem :mrgreen:
.
Well, if you find any usable information how to get the size reliably (by reliably, I mean it works in 100% cases, we are not interested in any heuristics which only mostly works), then feel free to come back and I will do my best to fix the problem. But at this time, it would be a change from one bad state to another, and that's not worth my time, sorry.

Re: Beta4 can not unpack *.gz

Posted: 01 Feb 2014, 17:51
by therube
I'll chime in :-).

Just had the occasion to come across some .gz files.

> Mozilla (SeaMonkey, Firefox,...), about:memory, Measure & save..., saves to .gz files

Next step was to do that again in a different version of FF.
Then to load the two reports with, Load and diff...

Doing so, I got, well essentially nothing.
(Two lines of text, I took to be headers, & nothing more, where I would have expected much more.)

Next step was to actually "look" at these .gz files.

So I "open" them & see "0" bytes, & I'm like, oh, no wonder my diff wasn't reporting anything! The generated file, that created the .gz, must have farted, so no actual content made it in there.

And then it dawned on me...

That thread in the Salamander forums. "0" bytes might be "expected".

F3, ah, there is data there. Guess what I just ran into!
F5, ah, there is some 20 MB of data there.

So yes, for a moment, seeing a "0" did throw me off.


Now knowing this, & having run into it this first time, I'll know for the next.
But seeing 0 did throw me off.

And if we can't get the "correct" number, perhaps an "N/A" (Not Applicable [& translation issues to go along with that]) or "-" or "?" or ...