[Python-ideas] duck typing for io write methods

Sun Jun 16 03:01:30 CEST 2013

2013/6/15 Wolfgang Maier <wolfgang.maier at biologie.uni-freiburg.de>:
> So if I understand the PEP correctly, then, theoretically, text mode file IO
> objects could be implemented to declare that all they'll ever need is 1 byte
> strings (if the encoding is ASCII-compatible)? Then converting incoming
> bytes from a file would also be reduced to copying and would eliminate much
> of the speed difference between 'r' and 'rb' modes?
> Is that done already, or are there problems with such an approach?

Many functions of Python core now have a "fast-path" for pure ASCII
data, or sometimes latin1 data. It is possible because a Unicode
string has now a flag indicating if it only contains ASCII characters
or not.

The optimization you suggest is not implemented because FileIO.read()
returns a bytes object, and there is no way to convert a bytes object
to a Unicode object without having to copy the content. It cannot be
implement because bytes strings and Unicode strings are made of one
unique memory block. The object header and the content are in the same
block, I guess that header of bytes and Unicode strings have a
different size, bytes and str are immutable.

I don't think that converting bytes to str is the bottleneck when you
read a long text file... (Reading data from disk is known to be
*slow*.)

Victor