[Python-Dev] bytes / unicode

Wed Jun 23 01:23:40 CEST 2010

On Tue, Jun 22, 2010 at 11:17 AM, Guido van Rossum <guido at python.org> wrote:

> (2) Data sources.
>
> These can be functions that produce new data from non-string data,
> e.g. str(<int>), read it from a named file, etc. An example is read()
> vs. write(): it's easy to create a (hypothetical) polymorphic stream
> object that accepts both f.write('booh') and f.write(b'booh'); but you
> need some other hack to make read() return something that matches a
> desired return type. I don't have a generic suggestion for a solution;
> for streams in particular, the existing distinction between binary and
> text streams works, of course, but there are other situations where
> this doesn't generalize (I think some XML interfaces have this
> awkwardness in their API for converting a tree to a string).
>

This reminds me of the optimization ElementTree and lxml made in Python 2
(not sure what they do in Python 3?) where they use str when a string is
ASCII to avoid the memory and performance overhead of unicode.  Also at
least lxml is also dealing with the divide between the internal libxml2
string representation and the Python representation.  This is a place where
bytes+encoding might also have some benefit.  XML is someplace where you
might load a bunch of data but only touch a little bit of it, and the amount
of data is frequently large enough that the efficiencies are important.

-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100622/3c3abc69/attachment.html>