Thanks for the clarity, Steve, a couple questions/thoughts: The choices are:
* don't represent them at all (remove bytes API)
Would the bytes API be removed on *nix also?
* convert and drop characters not in the (legacy) active code page * convert and fail on characters not in the (legacy) active code page
"Failure is not an option" -- These two seem like a plain old bad idea. * convert and fail on invalid surrogate pairs
where would an invalid surrogate pair come from? never from a file system API call, yes? * represent them as UTF-16-LE in bytes (with embedded '\0' everywhere)
would this be doing anything -- or just keeping whatever the Windows API takes/returns? i.e. exactly what is done on *nix?
The fifth option is the best for round-tripping within Windows APIs.
How is it better? only performance (i.e. no encoding/decoding required) -- or would it be more reliable as well? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov