Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
If Python should support BOM when reading text files, it should also be able to *write* such files. An encoding="BOM" argument wouldn't help here, because it does not specify which encoding to use actually: UFT-8, UTF-16-LE or what? That would be a point against encoding="BOM" and pro an additional keyword argument "use_bom" or whatever with the following values: None: default (old) behaviour: don't handle BOM at all True: reading: expect BOM (raising an exception if it's missing). The encoding argument must be None or it must match the encoding implied by the BOM writing: write a BOM. The encoding argument must be one of the UTF encodings. False: reading: If a BOM is present, use it to determine the file encoding. The encoding argument must be None or it must match the encoding implied by the BOM. (*) Otherwise, use the encoding argument to determine the encoding. writing: do not write a BOM. Use the encoding argument. (*) This is a question of taste. I think some people would prefer a fourth value "AUTO" instead, or to swap the behaviour of None and False. Henning P.S. To make things worse, I have sometimes seen XML files with a UTF-8 BOM, but an XML encoding declaration of "iso-8859-1". For such files, whatever you guess will be wrong anyway...
On Sun, Jan 10, 2010 at 12:10, Henning von Bargen
If Python should support BOM when reading text files, it should also be able to *write* such files.
That's what I thought too. Turns out the UTF-16 does write such a mark. You also have the constants in the codecs module, so you can write the utf-16-le BOM and then use the utf-16-le encoding if you want to be sure you write utf-16-le, and the same with BE, of course. I still think now using BOM's when determining the file format can be seen as a bug, though, so I don't think the API needs to change at all. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64
participants (2)
-
Henning von Bargen
-
Lennart Regebro