Newbie problem with codecs

derek / nul abuseonly at sgrail.org
Fri Aug 22 04:06:10 EDT 2003


On Fri, 22 Aug 2003 07:41:34 GMT, "Andrew Dalke" <adalke at mindspring.com> wrote:

>derek / nul
>> My code so far
>   ...
>> t = unicode(eng_file, "utf-16-le")
>> print t
>> -----------------------------------------------------
>>
>> The print fails (as expected) with a non printing char  '\ufeff'  which is
>of
>> course the BOM.
>> Is there a nice way to strip off the BOM?
>
>How does it fail?

  File "apply_physics.py", line 21, in ?
    print t
  File "C:\Program Files\Python\lib\encodings\cp850.py", line 18, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position
0: character maps to <undefined>

>It may be because print tries to convert the
>data as appropriate for your IDE or terminal, and fails.  Eg, the
>default expects ASCII.  See
>
>http://www.python.org/cgi-bin/faqw.py?req=show&file=faq04.102.htp
>
>Asa a guess, since you're on MS Windows, your terminal might
>expect mbcs.  Try
>
>print t.encode('mbcs')
>
>If you really want to strip it off, do t[2:] (or [4:]?), to get the
>string after the first 2/4 characters (the BOM) in the string.  But
>I doubt that's the correct solution.
>
>                    Andrew
>                    dalke at dalkescientific.com
>





More information about the Python-list mailing list