UnicodeDecodeError? Argh! Nothing works! I'm tired and hurting and...
Terry Reedy
tjreedy at udel.edu
Mon Nov 23 20:48:40 EST 2009
Alf P. Steinbach wrote:
> import os
> import fileinput
>
> def write( s ): print( s, end = "" )
I believe this is the same as
write = sys.stdout.write
though you never use it that I see.
>
> msg_id = 0
> f = open( "nul", "w" )
> for line in fileinput.input( mode = "rb" ):
I presume you are expecting the line to be undecoded bytes, as with
open(f,'rb'). To be sure, add write(type(line)).
> if line.startswith( "From - " ):
> msg_id += 1;
> f.close()
> print( msg_id )
> f = open( "msg_{0:0>6}.txt".format( msg_id ), "w+" )
I do not understand why you are writing since you just wanted to look.
In any case, you open in text mode.
> else:
> f.write( line )
> f.close()
> </code>
>
>
> <last few lines of output>
> 955
> 956
> 957
> 958
> Traceback (most recent call last):
> File "C:\test\tbfix\splitmails.py", line 11, in <module>
> for line in fileinput.input( mode = "rb" ):
> File "C:\Program Files\cpython\python31\lib\fileinput.py", line 254,
> in __next__
> line = self.readline()
> File "C:\Program Files\cpython\python31\lib\fileinput.py", line 349,
> in readline
> self._buffer = self._file.readlines(self._bufsize)
> File "C:\Program Files\cpython\python31\lib\encodings\cp1252.py", line
> 23, in decode
> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> 2188: character maps to <undefined
It goes ahead and tries to decode to str anyway. Maybe there is a bug,
though maybe the text-mode open in the loop somehow changes fileinput,
especially if you write to something it has open. So I would not report
a bug until I tried reading without writing.
tjr
More information about the Python-list
mailing list