[Python-Dev] Python 3.0.1 (io-in-c)

Antoine Pitrou solipsis at pitrou.net
Wed Jan 28 17:23:19 CET 2009

Paul Moore <p.f.moore <at> gmail.com> writes:
> > As I pointed out, utf-8, utf-16 and latin1 decoders have already been
> > in py3k. For *pure ASCII* input, utf-8 decoding is blazingly fast (1GB/s
> > The dataset for iobench isn't pure ASCII though, and that's why it's not
as fast.
> Ah, thanks. Although you said your data was 95% ASCII, and you're
> getting decode speeds of 250MB/s. That's 75% slowdown for 5% of the
> data! Surely that's not right???

If you look at how utf-8 decoding is implemented (in unicodeobject.c), it's
quite obvious why it is so :-) There is a (very) fast path for chunks of pure
ASCII data, and (fast but not blazingly fast) fallback for non ASCII data.

Please don't think of it as a slowdown... It's still much faster than 2.x, which
manages 130MB/s on the same data.



