Comparing emacs\etc\grep.txt

Discussion of bugs and problems found in Altap Salamander. In your reports, please be as descriptive as possible, and report one incident per report. Do not post crash reports here, send us the generated bug report by email instead, please.
User avatar
mdruiter
Posts: 263
Joined: 22 Feb 2006, 15:33
Location: Amsterdam, The Netherlands
Contact:

Comparing emacs\etc\grep.txt

Post by mdruiter »

I found a file which the File Comparator incorrectly detects as being binary instead of text. It's a file from the standard Emacs distribution, see the attachment.
The detection algorithm gets 'confused' because of Esc characters. :|
The internal viewer does detect the file as text. Maybe the File Comparator can use the Viewer's algorithm? Or at least consider Esc characters as text.
Esc characters occur in ANSI escape codes, in UNIX-like environments as well as DOS and Windows 9x.

All this isn't a very big deal of course, as redoing the compare overriding my default Automatic with Text solves it.
Attachments
grep.txt
(3.96 KiB) Downloaded 409 times
User avatar
mdruiter
Posts: 263
Joined: 22 Feb 2006, 15:33
Location: Amsterdam, The Netherlands
Contact:

Re: Comparing emacs\etc\grep.txt

Post by mdruiter »

Another example confusing the File Comparator, but the internal viewer as well, is attached.
It is also coming from the Emacs /etc folder (extension added). It's Esc characters again.
Attachments
HELLO.txt
(4.59 KiB) Downloaded 417 times
User avatar
SvA
Posts: 487
Joined: 29 Mar 2006, 02:41
Location: DE

Re: Comparing emacs\etc\grep.txt

Post by SvA »

For the sake of autodetection I'd consider any file containing nonprintable characters except as specified below as binary.
In a text file I'd allow the following control characters: CR LF HT VT FF and ^Z near the end of the file possibly followed by up to 127 garbage binary bytes (^Z as the text mode EOF marker in DOS and the garbage bytes as a tribute to a rather common defect in text files in the early DOS days).

If you allow ESC, why not the other control characters also? But then, any file is a text file...
User avatar
mdruiter
Posts: 263
Joined: 22 Feb 2006, 15:33
Location: Amsterdam, The Netherlands
Contact:

Re: Comparing emacs\etc\grep.txt

Post by mdruiter »

If you allow ESC, why not the other control characters also?
As I said: Esc characters occur in ANSI escape codes, which are used in text.

Especially ASCII zero characters occur in binary files only.
therube
Posts: 681
Joined: 14 Dec 2006, 06:22

Re: Comparing emacs\etc\grep.txt

Post by therube »

Any file is only as good as the utility used to interpret it :-).

What utility are you using to "properly" view these ".txt/ansi" files?

The UNIX file command lists the files as "data", with "data" being different from say ASCII txt.

Code: Select all

C:\TMP\X>FILE  -m magic  *.TXT
7514.TXT:     ASCII text, with CRLF, CR line terminators, with escape sequences
HELLO.txt:    data
doc_data.txt: ASCII text, with CRLF line terminators
grep.txt:     data
HELLO.txt & grep.txt are your two "ANSI" files.
doc_data.txt is a pure "text" file.
7514.TXT is a pure "text" file, except the very first byte is an ESC character.


Unfortunately (because I have a lot of files like 7514.TXT) the ESC character fools Universal Viewer. UV says the file is of "Unknown format" & wants to display it as binary.
WinXP Pro SP3 or Win7 x86 | SS 2.54
User avatar
mdruiter
Posts: 263
Joined: 22 Feb 2006, 15:33
Location: Amsterdam, The Netherlands
Contact:

Re: Comparing emacs\etc\grep.txt

Post by mdruiter »

What utility are you using to "properly" view these ".txt/ansi" files?
Emacs.
I agree there is no such thing as a universal definition of what is text and what's not.
See also http://forum.altap.cz/viewtopic.php?f=4&t=3942.

Aside: 'file' considers grep.txt data because of a single Unit separator (Ctrl+_) in the file. That's because it contains a part of a Unixy *.info file.
And HELLO.txt is data because 'file' thinks one letter of the Hindi word for Hindi is non-text.
Otherwise, both would be ASCII English text, with escape sequences.
Post Reply