[Python-Dev] PEP 383 update: utf8b is now the error handler

Wed May 6 21:18:03 CEST 2009

On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote:

> Zooko Wilcox-O'Hearn <zooko <at> zooko.com> writes:
>>
>> I'm not thinking of API compatibility as much as data  
>> compatibility -- someone used Python 3.1 to write down some  
>> filenames, and now a few years later they are trying to use the  
>> latest and greatest Python release to read those filenames...
>
> Well, if the filenames are generated by Python (as opposed to read  
> from an existing directory on disk), they should be regular unicode  
> objects without any lone surrogates, so I don't see the  
> compatibility problem.

I meant that the application reads filenames from an existing  
directory on disk, saves those filenames, and then later, using a  
future version of Python, wants to read them and use them.

I'm not saying that I know this would be a problem.  I'm saying that  
I personally can't tell whether it would be a problem or not, and the  
extensive discussions so far have not convinced me that there is  
anyone who both understands PEP 383 and considers this use case.

Many people who apparently understand encoding issues well have said  
something to the effect that there is no problem, but those people  
haven't yet managed to get through my thick skull how I would use PEP  
383 safely for this sort of use case -- the one where data generated  
by os.listdir() travels forward in time or the one were that data  
travels sideways to other systems, including Windows or other systems  
that validate incoming unicode.

That's why I am a bit uncomfortable about PEP 383 being quickly  
implemented and deployed in Python 3.1.

By the way, much of the detailed discussion about what Tahoe requires  
and how that may or may not benefit from PEP 383 has now moved to the  
tahoe-dev mailing list: http://allmydata.org/cgi-bin/mailman/listinfo/ 
tahoe-dev .

Regards,

Zooko