[IronPython] reading utf-8 files

Dave Fugate dfugate at microsoft.com
Tue Jul 20 18:20:38 CEST 2010


In this case, it looks like the test files aren't really utf-8-sig.  That is, under CPython:
C:\Users\dfugate\Desktop>C:\Python27\python.exe
Python 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs
>>> import nt
>>> dir = nt.getcwd()
>>> i = 'a - 2 lines.txt'
>>> file = codecs.open(dir + "\\" + i, "r", "utf_8_sig")
>>> for line in file: print line
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-7: character maps to <undefined>

>>> file.close()
>>> i = 'a - 3 lines.txt'
>>> file = codecs.open(dir + "\\" + i, "r", "utf_8_sig")
>>> for line in file: print line
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-7: character maps to <undefined>
>>> file.close()

The bug here is that IronPython can process 'a - 2 lines.txt'.

Dave

From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of abdalla ramadan
Sent: Friday, July 09, 2010 11:46 AM
To: users at lists.ironpython.com
Subject: [IronPython] reading utf-8 files

Hello,

I am trying to read utf-8 files (written using notepad and have BOM) using the following code

file = codecs.open(dir+ '\\' + i,"r",'utf_8_sig')
for line in file:
    print "line"

I attached two files the a - 3 lines.txt file gives this exception and print "line" is never called not even once

Unhandled Exception: System.Text.EncoderFallbackException: failed to decode bytes at index 65

but the file a - 2 lines.txt is read without problems

I tried with several different texts but I could not find rule for a file that throws this exception. I tested other files with 3 lines that did not throw the exception.

Thanks very much for advance
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ironpython-users/attachments/20100720/cdd5c0c6/attachment.html>


More information about the Ironpython-users mailing list