PEP 383: Non-decodable Bytes in System Character Interfaces
On Fri, Apr 24, 2009 at 12:04 PM, Glenn Linderman
The goal of Unicode users everywhere is to use Unicode for everything, no? After all, all "real" file should have Unicode based names, and the only proper byte sequences that should exist are UTF-8 encoding Unicode bytes. (Cheek to tongue: Get out of here!)
Humour aside :), the expectation that filenames are Unicode data simply doesn't agree with the reality of POSIX file systems. I think an approach similar to that adopted by glib [1] could work -- i.e. use the bytes API and provide some tools to assist application developers in converting them to and from Unicode strings (these tools are then where all the guess work about what encoding to use can live). [1] http://library.gnome.org/devel/glib/stable/glib-Character-Set-Conversion.htm... Schiavo Simon
participants (22)
-
"Martin v. Löwis"
-
Aahz
-
Antoine Pitrou
-
Benjamin Peterson
-
Cameron Simpson
-
Glenn Linderman
-
James Y Knight
-
Jeroen Ruigrok van der Werven
-
Michael Foord
-
Michael Urman
-
MRAB
-
Oleg Broytmann
-
Paul Moore
-
R. David Murray
-
Simon Cross
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy
-
Thomas Breuel
-
Tony Nelson
-
Toshio Kuratomi
-
Tres Seaver