Unicode Support in Salamander 2.5 Final

Discussion of bugs and problems found in Altap Salamander. In your reports, please be as descriptive as possible, and report one incident per report. Do not post crash reports here, send us the generated bug report by email instead, please.
troubled

Unicode Support in Salamander 2.5 Final

Post by troubled »

I am running Altap Salamander 2.5 Final on Windows Server 2003 DataCenter Edition. This version of Windows supports Unicode filenames on NTFS partitions. The support is almost seamless in Windows Explorer.

Recently, I needed to copy a number of files with non-English non-European names (and these were on an NTFS partition, so I knew they were encoded in Unicode and not some obscure codepage) from a hard disk parition to another when I ran into a problem with Salamander.

It would not show the non-English characters. Worse, it would not even copy them. It complained that "the file ????.??? does not exist." All non-English characters were replaced by ?'s. Windows Explorer, on the other hand, had no problem viewing these filenames and operating on them.

Later, I tried creating new files with non-English names. I pressed [Shift]+[F4] to bring up the new file dialog, pressed [Alt]+[Shift] to change the input language, typed the name and pressed the OK button. Salamander returned with this error: "(123) The filename, directory name, or volume label syntax is incorrect," and went back to the new file dialog. The typed name was being displayed correctly before I pressed the OK button but after the error it turned into a number of ?'s.

I also have experienced various problems with non-English directory names. Some non-English European directory names (for example, one that contained an r with a caron over it, as in the musician's name, Dvorak), would display but would not open when I tried to browse them.

None of these problems happen with Windows Explorer, so I presume the problem is somewhere in Salamander's Unicode support.
troubled

Oh, sorry

Post by troubled »

I just searched altap.cz for "unicode" and found out that Salamnder 2.5 does not support Unicode. My fault, should have searched first.

This thread should actually be in feature requests.
omega
Posts: 196
Joined: 09 Dec 2005, 19:21

Post by omega »

It does not support Unicode YET :) , but it's already listed on the to do list.
roman2
Posts: 106
Joined: 07 Aug 2006, 11:11

Post by roman2 »

I've been using Salamander with Cyrillic letters in file and directory names with no problems. I've just configured Windows to use Russian as a language for non-Unicode programs and changed Script to Cyrillic in the font settings in Salamander.
Jan Rysavy
ALTAP Staff
ALTAP Staff
Posts: 5231
Joined: 08 Dec 2005, 06:34
Location: Novy Bor, Czech Republic
Contact:

Post by Jan Rysavy »

Detailed information about Altap Salamander and Unicode: http://www.altap.cz/salam_en/help/salam ... upport.htm
Foobar2000User
Posts: 4
Joined: 01 Feb 2009, 21:14

Post by Foobar2000User »

roman2 wrote:I've been using Salamander with Cyrillic letters in file and directory names with no problems. I've just configured Windows to use Russian as a language for non-Unicode programs and changed Script to Cyrillic in the font settings in Salamander.
No solution for me. I am German but learn Russian for a while now. I have quite a bunch of cyrillic named files (e.g. MP3s or pictures from friends). I like Servant Salamander a lot and consider buying the full version (now using the free version and from time to time i try the latest demo version). No unicode is a deal-breaker for me. Even my MP3-tags are all utf-8/16. What I nearly don't use at all are the plugins (i know, they put a lot of work into these), I can't stand the browsing of archives as if they were folders, that's the first thing to switch of when re-installing windows. Also (maybe I am alone with my opinnio) I prefer functionality over features. I need no thumbnails (always file list), I want Servant Salamander to be as fast as possible ... i hope they are not about to turn this nice piece of software into "bloatware" (ACDSee - if someone knows - is a "nice" example of a once fast and functional program turned into slow bloatware).

Greetings,
Erebos
User avatar
zarevak
Plugin Developer
Plugin Developer
Posts: 789
Joined: 04 Feb 2006, 16:49
Location: Prague, Czech Republic

Post by zarevak »

Interresing point of view ;) I personally hated the "zip archives as folders" in Windows Explorer implementation - it was slow, not very user-friendly, ... But I love how seamlessly integrated it is in Salamander - you just enter the archive as regular folder; you can copy files out of the archive using F5 or Ctrl+C and Ctrl+V. For supported archives you can even copy files to the archive! I rarely use any external archiver beacause of the good Salamander capabilities. (If you want to continue this discussion let's create new thread and leave this to UNICODE)

Back to UNICODE: Because Salamander still supports Windows 95 it is not easy to do UNICODE support. You either need to do:
1) Two version of Salamander - ANSI and UNICODE. This means double the hassle with releases.
2) Manual internal handling of ANSI/UNICODE codepaths based on the system Salamander is running. Very easy to make a mistake and hard to maintain.
3) Use Microsoft Layer for Unicode. AFIAK this simplifies the Salamander to be just UNICODE application, but requires the users to install additional libraries on their system. I've never used this library so I don't know any details; it may currently be the best way to go.
4) Drop Windows 9x support. Altap is planning to drop Windows 9x support in the future, but I'm not sure when.

In contrast with 64-bit support, all supporting technologies (compilers) can support UNICODE, but it is not matter of just recompiling Salamander with UNICODE. There are many language related issues involved:
- sorting English: C, .. Ch, ... G, H, I, .... Czech: E, F, G, H, Ch, I ...
- uppercase/lowercase translation: English: Upcase("i") = I Turkish: Upcase("i") = İ
- combining character vs. precomposed charactes: ́ + a = á
- different looking characters with the same meaning: ss = ß
- complex languages where characters change shapes based on other characters (Arabic).

Czech thread about UNICODE details
roman2
Posts: 106
Joined: 07 Aug 2006, 11:11

Post by roman2 »

Wow! I didn't know there were so many issues with Unicode support. How do other software vendors handle this? Do they all deal with this or do they use a library to help them?

Does Salamander bring enough money from Win 9x users to justify continuing to develop new versions for them?

It shouldn't be an issue for anyone to download and install a library from MS.
User avatar
zarevak
Plugin Developer
Plugin Developer
Posts: 789
Joined: 04 Feb 2006, 16:49
Location: Prague, Czech Republic

Post by zarevak »

Most of the software depends on just what Windows offers and handles with their quirks and issues ;) I may have exaggerated the problems a bit - it depends whether you want just enough UNICODE "support" or good UNICODE support.

In the Czech thread there are few posts about problems with Vista's handling of the equality of "fi" = "fi" (one character) and "ss" = "ß". In my tests, having the "fi" character in text document completely brakes searching in Notepad (copy the sample from the the linked post to Notepad and try to search for "F" or "fi" (two characters) - WinXP: ignores fi Vista: breaks! Win7: works correctly). I'm not sure, how deep the problem is or if it is possible to work around it, but Salamander has to provide consistent file search not depending on the contents of the file.

Using combining and precomposed characters is also issue for file searching. If you want to use some string search algorithm you have to tweak it to support characters of different binary lengths.

The issue I've tried to point out with upcasing was that one character can have multiple (or none) upper/lower case characters and UpperCase(LowerCase(char)) != char

There is also issue with different UNICODE encodings:
- UTF-8 with character lenghts of 1 to 4 bytes. Not every byte sequence is valid as UTF-8 string. Software must be able to handle malformed input and resynchronize.
- UTF-16 with character lengths of 2 of 4 bytes. This is basic encoding people refer to as UNICODE and it is what Windows NT/2000/XP/... uses.
- UCS-2 with character lengths of exactly 2 bytes. This is subset of full UNICODE charactrer set and it is used on NTFS for storing UNICODE filenames.
- UTF-32/UCS-4 with character lengths of eaxactly 4 bytes.

And I've forgot one language related issue: RTL languages. Salamander has to handle arabic filenames and properly display the name.

BTW: There is 5th possible solution for the UNICODE support - use some other custom UNICODE library which would get distributed with Salamander.
User avatar
SelfMan
Posts: 1143
Joined: 05 Apr 2006, 20:51
Contact:

Post by SelfMan »

zarevak wrote:There is 5th possible solution for the UNICODE support - use some other custom UNICODE library which would get distributed with Salamander.
You mean something like this? http://icu-project.org/download/3.6.html
User avatar
zarevak
Plugin Developer
Plugin Developer
Posts: 789
Joined: 04 Feb 2006, 16:49
Location: Prague, Czech Republic

Post by zarevak »

zarevak wrote:There is also issue with different UNICODE encodings:
- UTF-8 with character lenghts of 1 to 4 bytes. Not every byte sequence is valid as UTF-8 string. Software must be able to handle malformed input and resynchronize.
- UTF-16 with character lengths of 2 of 4 bytes. This is basic encoding people refer to as UNICODE and it is what Windows NT/2000/XP/... uses.
- UCS-2 with character lengths of exactly 2 bytes. This is subset of full UNICODE charactrer set and it is used on NTFS for storing UNICODE filenames.
- UTF-32/UCS-4 with character lengths of eaxactly 4 bytes.
I'm sorry for mystifying you - there is an error in the quoted paragraph: NTFS in current Windows versions CAN support all UNICODE characters (in UTF-16 like encoding), but it doesn't check if the filename encoding is correct (same problem I've described with UTF-8 ). This means some filenames cannot be properly displayed, because their filename encoding is malformed or from the future. (This is one of the reasons behind not checking UTF-16 filename encoding validity. Older Windows know older UNICODE versions, but they have to support new NTFS volumes - eg. volumes created in Windows 7 have to work in old Windows 2000) - Based on Wikipedia: NTFS and Jan Rysavy's links in his post: How are the file names encoded? and NTFS and Unicode?
SelfMan wrote:You mean something like this? http://icu-project.org/download/3.6.html
Maybe. I'm just Plugin Developer and I don't have any saying on Salamander development more then you. There is at least one Czech post about probability of using independent UNICODE library from Jan Rysavy. For Czech reading visitors I'm recommenging reading the Czech UNICODE thread I've linked few posts from.
Jan Patera
Plugin Developer
Plugin Developer
Posts: 707
Joined: 08 Dec 2005, 14:33
Location: Prague, Czech Republic
Contact:

RE: MP3 tags (Unicode Support in Salamander 2.5 Final)

Post by Jan Patera »

Foobar2000User wrote:Even my MP3-tags are all utf-8/16.
Support for UTF8 in MP3 tags has been added and support for UTF16 in MP3 and UTF8 in OGG has been improved for the next release of Altap Salamander (2.52b2).
Post Reply