"MAL" == M
writes:
MAL> Stephen J. Turnbull wrote: >> The Japanese "memopado" (Notepad) uses UTF-8 signatures; it >> even adds them to existing UTF-8 files lacking them. MAL> Is that a MS application ? AFAIK, notepad, wordpad and MS MAL> Office always use UTF-16-LE + BOM when saving text as "Unicode MAL> text". Yes, it is an MS application. I'll have to borrow somebody's box to check, but IIRC UTF-8 is the native "text" encoding for Japanese now. (Japanized applications generally behave differently from everything else, as there are so many "standards" for encoding Japanese.) M> The UTF-16 stream codecs implement this logic. M> The UTF-16 encode and decode functions will however always M> strip the BOM mark from the beginning of a string. M> If the application doesn't want this stripping to happen, it M> should use the UTF-16-LE or -BE codec resp. That sounds like it would work fine almost all the time. If it doesn't it's straightforward to work around, and certainly would be more convenient for the non-standards-geek programmer. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.