[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Stephen J. Turnbull stephen at xemacs.org
Tue Apr 28 06:26:36 CEST 2009

Michael Foord writes:

 > The problem you don't address, which is still the reality for most 
 > programmers (especially Mac OS X where filesystem encoding is UTF 8), is 
 > that programmers *are* going to treat filenames as strings.

 > The proposed PEP allows that to work for them - whatever platform their 
 > program runs on.

Sure, for values of "work" == "No exception will be raised in my
module, and some content will actually be returned."  It doesn't say
anything about what happens once those strings escape the immediate
context.  So it *encourages* those programmers to pass any problems
downstream, but only after discarding the resources needed to deal
with problems effectively.

It's not that hard to overcome that problem, but it does require a
slightly more complex API, and one that doesn't return a string but
rather a stringlike object annotated with the information about how it
was decoded.  Conversion to a string *should* be trivial; I just think
it should be invoked explicitly to make it clear where information is
being discarded.  Without an implicit conversion, the nature of the
data (ie, context-dependent structure) is made explicit.  There's a
natural place to document the problem that context must be used to
interpret the data accurately, and even add more robust processing (in
a new PEP, of course!), etc.

Then in the future this interface could be used as the basis of a more
robust API.  With good design (and luck) it might be subclassible or
extensible to a path object API, for example.  PEP 383 on the other
hand is a dead end as it stands.  AFAICS it gives the best possible
treatment of conversion of OS data to plain string, but we're already
got developers lining up to say "I can't use it". :-(

More information about the Python-Dev mailing list