unicode() vs. s.decode()
Jason Tackaberry
tack at urandom.ca
Thu Aug 6 10:15:41 EDT 2009
On Thu, 2009-08-06 at 01:31 +0000, John Machin wrote:
> Faster by an enormous margin; attributing this to the cost of attribute lookup
> seems implausible.
Ok, fair point. I don't think the time difference fully registered when
I composed that message.
Testing a global access (LOAD_GLOBAL) versus an attribute access on a
global object (LOAD_GLOBAL + LOAD_ATTR) shows that the latter is about
40% slower than the former. So that certainly doesn't account for the
difference.
> Suggested further avenues of investigation:
>
> (1) Try the timing again with "cp1252" and "utf8" and "utf_8"
>
> (2) grep "utf-8" <Python2.X_source_code>/Objects/unicodeobject.c
Very pedagogical of you. :) Indeed, it looks like bigger player in the
performance difference is the fact that the code path for unicode(s,
enc) short-circuits the codec registry for common encodings (which
includes 'utf-8' specifically), whereas s.decode('utf-8') necessarily
consults the codec registry.
Cheers,
Jason.
More information about the Python-list
mailing list