[Python-Dev] bytes / unicode

Michael Foord fuzzyman at voidspace.org.uk
Mon Jun 21 18:49:55 CEST 2010


On 21/06/2010 17:46, P.J. Eby wrote:
> At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote:
>> It may be that there are places where we need to rewrite standard
>> library algorithms to be bytes/str neutral (e.g. by using length one
>> slices instead of indexing). It may be that there are more APIs that
>> need to grow "encoding" keyword arguments that they then pass on to
>> the functions they call or use to convert str arguments to bytes (or
>> vice-versa). But without people trying to port affected libraries and
>> reporting bugs when they find issues, the situation isn't going to
>> improve.
>>
>> Now, if these bugs are already being reported against 3.1 and just
>> aren't getting fixed, that's a completely different story...
>
> The overall impression, though, is that this isn't really a step 
> forward. Now, bytes are the special case instead of unicode, but that 
> special case isn't actually handled any better by the stdlib - in 
> fact, it's arguably worse. And, the burden of addressing this seems to 
> have been shifted from the people who made the change, to the people 
> who are going to use it. But those people are not necessarily in a 
> position to tell you anything more than, "give me something that works 
> with bytes".
>
> What I can tell you is that before, since string constants in the 
> stdlib were ascii bytes, and transparently promoted to unicode, stdlib 
> behavior was *predictable* in the presence of special cases: you got 
> back either bytes or unicode, but either way, you could idempotently 
> upgrade the result to unicode, or just pass it on. APIs were "str 
> safe, unicode aware". If you passed in bytes, you weren't going to get 
> unicode without a warning, and if you passed in unicode, it'd work and 
> you'd get unicode back.
>
> Now, the APIs are neither safe nor aware -- if you pass bytes in, you 
> get unpredictable results back.
>
> Ironically, it almost *would* have been better if bytes simply didn't 
> work as strings at all, *ever*, but if you could wrap them with a 
> bstr() to *treat* them as text. You could still have restrictions on 
> combining them, as long as it was a restriction on the unicode you 
> mixed with them. That is, if you could combine a bstr and a str if the 
> *str* was restricted to ASCII.
>
> If we had the Python 3 design discussions to do over again, I think I 
> would now have stuck with the position of not letting bytes be 
> string-compatible at all, and instead proposed an explicit bstr() 
> wrapper/adapter to use them as strings, that would (in that case) 
> force coercion in the direction of bytes rather than strings. (And 
> bstr need not have been a builtin - it could have been something you 
> import, to help discourage casual usage.)
>
> Might this approach lead to some people doing things wrong in the case 
> of porting? Sure. But there'd be little reason to use it in new code 
> that didn't have a real need for bytestring manipulation.
>
> It might've been a better balance between practicality and purity, in 
> that it keeps the language pure, while offering a practical way to 
> deal with things in bytes if you really need to. And, bytes wouldn't 
> silently succeed *some* of the time, leading to a trap. An easy 
> inconsistency is worse than a bit of uniform chicken-waving.
>
> Is it too late to make that tradeoff? Probably. Certainly it's not 
> practical to *implement* outside the language core, and removing 
> string methods would fux0r anybody whose currently-ported code relies 
> on bytes objects having string-like methods.
>

Why is your proposed bstr wrapper not practical to implement outside the 
core and use in your own libraries and frameworks?

Michael

> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk 
>


-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.




More information about the Python-Dev mailing list