[Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

James Y Knight foom at fuhm.net
Tue Sep 30 01:33:47 CEST 2008


On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote:
> An ugly hack, but more correct than UTF-8b or any similar attempt to
> do "unicode but not quite unicode"; either it's lossy, or it's not
> unicode.  There's no in between.

Promoting the use of 8859-1 to decode mostly-utf-8 data seems like a  
very poor way forward. I don't see how you can claim it's more  
correct. It's correct in no case except for pure ASCII on a utf-8  
system.

I still like the UTF-8b proposal, but if you want to push against  
that, I don't see any sensible alternative but to move back towards a  
bytestring API. Having two parallel APIs or a mixture of data types is  
confusing, so, just toss the Unicode APIs entirely. That'd be much  
much nicer than having everyone use 8859-1, incorrectly, for their  
platform encoding.

On Windows, the platform-native Unicode strings could simply be  
encoded into utf-8 when entering Python-land, and decoded back to  
Unicode when leaving pythonland, to keep the API consistently  
bytestring oriented on both platforms.

James



More information about the Python-3000 mailing list