Newbie problem with codecs

Andrew Dalke adalke at
Fri Aug 22 09:41:34 CEST 2003

derek / nul
> My code so far
> t = unicode(eng_file, "utf-16-le")
> print t
> -----------------------------------------------------
> The print fails (as expected) with a non printing char  '\ufeff'  which is
> course the BOM.
> Is there a nice way to strip off the BOM?

How does it fail?  It may be because print tries to convert the
data as appropriate for your IDE or terminal, and fails.  Eg, the
default expects ASCII.  See

Asa a guess, since you're on MS Windows, your terminal might
expect mbcs.  Try

print t.encode('mbcs')

If you really want to strip it off, do t[2:] (or [4:]?), to get the
string after the first 2/4 characters (the BOM) in the string.  But
I doubt that's the correct solution.

                    dalke at

More information about the Python-list mailing list