[Python-Dev] email package status in 3.X

P.J. Eby pje at telecommunity.com
Mon Jun 21 16:51:25 CEST 2010


At 10:20 PM 6/21/2010 +1000, Nick Coghlan wrote:
>For the idea of avoiding excess copying of bytes through multiple
>encoding/decoding calls... isn't that meant to be handled at an
>architectural level (i.e. decode once on the way in, encode once on
>the way out)? Optimising the single-byte codec case by minimising data
>copying (possibly through creative use of PEP 3118) may be something
>that we want to look at eventually, but it strikes me as something of
>a premature optimisation at this point in time (i.e. the old adage
>"first get it working, then get it working fast").

The issue is, I'd like to have an idempotent incantation that I can 
use to make the inputs and outputs to stdlib functions behave in a 
type-safe manner with respect to bytes, in cases where bytes are 
really what I want operated on.

Note too that this is an argument for symmetry in wrapping the inputs 
and outputs, so that the code doesn't have to "know" what it's dealing with!

After all, right now, if a stdlib function might return bytes or 
unicode depending on runtime conditions, I can't even hardcode an 
.encode() call -- it would fail if the return type is a bytes.

This basically goes against the "tell, don't ask" pattern, and the 
Pythonically idempotent approach.  That is, Python builtins normally 
return you back the same thing if it's already what you want - 
int(someInt)-> someInt, iter(someIter)->someIter, etc.

Since this incantation may need to be used often, and in places that 
are not known to me in advance, I would like it to not impose new 
overhead in unexpected places.  (i.e., the usual argument brought 
against making changes to the 'list' type that would change certain 
operations from O(1) to O(log something)).

It's more about predictability, and having One *Obvious* Way To Do 
It, as opposed to "several ways, which you need to think carefully 
about and restructure your entire architecture around if 
necessary".  One obvious way means I can focus on the mechanical 
effort of porting *first*, without having to think.

So, the performance issue isn't really about performance *per se*, so 
much as about the "mental UI" of the language.  You could just as 
easily lie and tell me that your bstr implementation is O(1), and I 
would probably be happy and never notice, because the issue was never 
really about performance as such, but about having to *think* about 
it.  (i.e., breaking flow.)

Really, the entire issue can presumably be dealt with by some series 
of incantations - it's just code after all.  But having to sit and 
think about *every* situation where I'm dealing with bytes/unicode 
distinctions seems like a torture compared to being able to say, 
"okay, so when dealing with this sort of API and this sort of data, 
this is the One Obvious Way to do the conversions."

It's One Obvious Way that I want, but some people seem to be arguing 
that the One Obvious Way is to Think Carefully About It Every Time -- 
and that seems to violate the "Obvious" part, IMO.  ;-)



More information about the Python-Dev mailing list