[Python-Dev] email package status in 3.X
P.J. Eby
pje at telecommunity.com
Mon Jun 21 16:51:25 CEST 2010
At 10:20 PM 6/21/2010 +1000, Nick Coghlan wrote:
>For the idea of avoiding excess copying of bytes through multiple
>encoding/decoding calls... isn't that meant to be handled at an
>architectural level (i.e. decode once on the way in, encode once on
>the way out)? Optimising the single-byte codec case by minimising data
>copying (possibly through creative use of PEP 3118) may be something
>that we want to look at eventually, but it strikes me as something of
>a premature optimisation at this point in time (i.e. the old adage
>"first get it working, then get it working fast").
The issue is, I'd like to have an idempotent incantation that I can
use to make the inputs and outputs to stdlib functions behave in a
type-safe manner with respect to bytes, in cases where bytes are
really what I want operated on.
Note too that this is an argument for symmetry in wrapping the inputs
and outputs, so that the code doesn't have to "know" what it's dealing with!
After all, right now, if a stdlib function might return bytes or
unicode depending on runtime conditions, I can't even hardcode an
.encode() call -- it would fail if the return type is a bytes.
This basically goes against the "tell, don't ask" pattern, and the
Pythonically idempotent approach. That is, Python builtins normally
return you back the same thing if it's already what you want -
int(someInt)-> someInt, iter(someIter)->someIter, etc.
Since this incantation may need to be used often, and in places that
are not known to me in advance, I would like it to not impose new
overhead in unexpected places. (i.e., the usual argument brought
against making changes to the 'list' type that would change certain
operations from O(1) to O(log something)).
It's more about predictability, and having One *Obvious* Way To Do
It, as opposed to "several ways, which you need to think carefully
about and restructure your entire architecture around if
necessary". One obvious way means I can focus on the mechanical
effort of porting *first*, without having to think.
So, the performance issue isn't really about performance *per se*, so
much as about the "mental UI" of the language. You could just as
easily lie and tell me that your bstr implementation is O(1), and I
would probably be happy and never notice, because the issue was never
really about performance as such, but about having to *think* about
it. (i.e., breaking flow.)
Really, the entire issue can presumably be dealt with by some series
of incantations - it's just code after all. But having to sit and
think about *every* situation where I'm dealing with bytes/unicode
distinctions seems like a torture compared to being able to say,
"okay, so when dealing with this sort of API and this sort of data,
this is the One Obvious Way to do the conversions."
It's One Obvious Way that I want, but some people seem to be arguing
that the One Obvious Way is to Think Carefully About It Every Time --
and that seems to violate the "Obvious" part, IMO. ;-)
More information about the Python-Dev
mailing list