Thanks for the clarity, Steve, a couple questions/thoughts:

The choices are:

* don't represent them at all (remove bytes API)

Would the bytes API be removed on *nix also?
 
* convert and drop characters not in the (legacy) active code page
* convert and fail on characters not in the (legacy) active code page

"Failure is not an option" -- These two seem like a plain old bad idea.

* convert and fail on invalid surrogate pairs

where would an invalid surrogate pair come from? never from a file system API call, yes?

* represent them as UTF-16-LE in bytes (with embedded '\0' everywhere)

would this be doing anything -- or just keeping whatever the Windows API takes/returns? i.e. exactly what is done on *nix?
 
The fifth option is the best for round-tripping within Windows APIs.

How is it better? only performance (i.e. no encoding/decoding required) -- or would it be more reliable as well?
 
-CHB


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov