"MAL" == M firstname.lastname@example.org writes:
MAL> Stephen J. Turnbull wrote:
>> The Japanese "memopado" (Notepad) uses UTF-8 signatures; it >> even adds them to existing UTF-8 files lacking them.
MAL> Is that a MS application ? AFAIK, notepad, wordpad and MS MAL> Office always use UTF-16-LE + BOM when saving text as "Unicode MAL> text".
Yes, it is an MS application. I'll have to borrow somebody's box to check, but IIRC UTF-8 is the native "text" encoding for Japanese now. (Japanized applications generally behave differently from everything else, as there are so many "standards" for encoding Japanese.)
M> The UTF-16 stream codecs implement this logic.
M> The UTF-16 encode and decode functions will however always M> strip the BOM mark from the beginning of a string.
M> If the application doesn't want this stripping to happen, it M> should use the UTF-16-LE or -BE codec resp.
That sounds like it would work fine almost all the time. If it doesn't it's straightforward to work around, and certainly would be more convenient for the non-standards-geek programmer.