Newbie problem with codecs
derek / nul
abuseonly at sgrail.org
Fri Aug 22 04:06:10 EDT 2003
On Fri, 22 Aug 2003 07:41:34 GMT, "Andrew Dalke" <adalke at mindspring.com> wrote:
>derek / nul
>> My code so far
> ...
>> t = unicode(eng_file, "utf-16-le")
>> print t
>> -----------------------------------------------------
>>
>> The print fails (as expected) with a non printing char '\ufeff' which is
>of
>> course the BOM.
>> Is there a nice way to strip off the BOM?
>
>How does it fail?
File "apply_physics.py", line 21, in ?
print t
File "C:\Program Files\Python\lib\encodings\cp850.py", line 18, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position
0: character maps to <undefined>
>It may be because print tries to convert the
>data as appropriate for your IDE or terminal, and fails. Eg, the
>default expects ASCII. See
>
>http://www.python.org/cgi-bin/faqw.py?req=show&file=faq04.102.htp
>
>Asa a guess, since you're on MS Windows, your terminal might
>expect mbcs. Try
>
>print t.encode('mbcs')
>
>If you really want to strip it off, do t[2:] (or [4:]?), to get the
>string after the first 2/4 characters (the BOM) in the string. But
>I doubt that's the correct solution.
>
> Andrew
> dalke at dalkescientific.com
>
More information about the Python-list
mailing list