[Python-Dev] Quick sum up about open() + BOM

Lennart Regebro regebro at gmail.com
Sat Jan 9 06:48:36 CET 2010


It seems to me that when opening a file, the following is the only
flow that makes sense for the typical opening of a file flow:

if encoding is not None:
   use encoding
elif file has BOM:
   use BOM
else:
   use system default

And hence a encoding='BOM' isn't needed there. Although I'm trying to
come up with usecases that doesn't work with this, I can't. :)

BUT

When writing things are not so easy though. Apparently some encodings
require a BOM to be written, but others do not, but allow it, and some
has no byte order mark. So there you have to be able to write the BOM,
or not. And that's either a new parameter, because you can't use
encoding='BOM' since you need to specify the encoding as well, or a
new method.

I would suggest a BOM parameter, and maybe a method as  well.

BOM=None|True|False

Where "None" means a sane default behaviour, that is write a BOM if
the encoding require it.
"True" means write a BOM if the encoding *supports* it.
"False" means Don't write a BOM even if the encoding requires it
(because I know what I'm doing)

if 'w' in mode: # But not 'r' or 'a'
    if BOM == True and encoding in (ENCODINGS THAT ALLOW BOM):
        write_bom = True
    elif BOM == False:
       write_bom = False
    elif BOM == None and encoding in (ENCODINGS THAT REQUIRE BOM):
          write_bom = True
    else:
          write_bom = False
else:
    write_bom = False

For reading this parameter could either be a noop, or possibly change
the behavior somehow, if a usecase where that makes sense can be
imagined.

-- 
Lennart Regebro: http://regebro.wordpress.com/
Python 3 Porting: http://python-incompatibility.googlecode.com/
+33 661 58 14 64



More information about the Python-Dev mailing list