UnicodeDecodeError? Argh! Nothing works! I'm tired and hurting and...

Alf P. Steinbach alfps at start.no
Mon Nov 23 17:37:58 EST 2009


* Alf P. Steinbach:
> 
> <code>
> import os
> import fileinput
> 
> def write( s ): print( s, end = "" )
> 
> msg_id = 0
> f = open( "nul", "w" )
> for line in fileinput.input( mode = "rb" ):
>     if line.startswith( "From - " ):
>         msg_id += 1;
>         f.close()
>         print( msg_id )
>         f = open( "msg_{0:0>6}.txt".format( msg_id ), "w+" )
>     else:
>         f.write( line )
> f.close()
> </code>
> 
> 
> <last few lines of output>
> 955
> 956
> 957
> 958
> Traceback (most recent call last):
>   File "C:\test\tbfix\splitmails.py", line 11, in <module>
>     for line in fileinput.input( mode = "rb" ):
>   File "C:\Program Files\cpython\python31\lib\fileinput.py", line 254, 
> in __next__
>     line = self.readline()
>   File "C:\Program Files\cpython\python31\lib\fileinput.py", line 349, 
> in readline
>     self._buffer = self._file.readlines(self._bufsize)
>   File "C:\Program Files\cpython\python31\lib\encodings\cp1252.py", line 
> 23, in decode
>     return codecs.charmap_decode(input,self.errors,decoding_table)[0]
> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 
> 2188: character maps to <undefined
> </last few lines of output>

The following worked:


<code>
import sys
import fileinput

def write( s ): print( s, end = "" )

msg_id = 0
f = open( "nul", "w" )
input = sys.stdin.detach()  # binary
while True:
     line = input.readline()
     if len( line ) == 0:
         break
     elif line.decode( "ascii", "ignore" ).startswith( "From - " ):
         msg_id += 1;
         f.close()
         print( msg_id )
         f = open( "msg_{0:0>6}.txt".format( msg_id ), "wb+" )
     else:
         f.write( line )
f.close()
</code>


Cheers,

- Alf



More information about the Python-list mailing list