[Python-Dev] Improve open() to support reading file starting with an unicode BOM

Walter Dörwald walter at livinglogic.de
Mon Jan 11 14:21:15 CET 2010


On 11.01.10 13:45, Lennart Regebro wrote:

> On Mon, Jan 11, 2010 at 13:29, Walter Dörwald <walter at livinglogic.de> wrote:
>> However if this autodetection feature is useful in other cases (no
>> matter how it's activated), it should be a codec, because as part of the
>> open() function it isn't reusable.
> 
> But an autodetect feature is not a codec. Sure it should be reusable,
> but making it a codec seems to be  a weird hack to me.

I think we already had this discussion two years ago in the context of
XML decoding ;):

http://mail.python.org/pipermail/python-dev/2007-November/075138.html

> And how would
> you reuse it if it was a codec? A reusable autodetect feature would be
> useable to detect what codec it is. A autodetect codec would not be
> useful for that, as it would simply just decode.

I have implemented an XML codec (as part of XIST:
http://pypi.python.org/pypi/ll-xist), that can do that:

>>> from ll import xml_codec
>>> import codecs
>>> c = codecs.getincrementaldecoder("xml")()
>>> c.encoding
>>> c.decode("<?xml")
u''
>>> c.encoding
>>> c.decode(" version='1.0'")
u''
>>> c.encoding
>>> c.decode(" encoding='iso-8859-1'?>")
u"<?xml version='1.0' encoding='iso-8859-1'?>"
>>> c.encoding
'iso-8859-1'

Servus,
   Walter



More information about the Python-Dev mailing list