Newbie problem with codecs

derek / nul abuseonly at sgrail.org
Sat Aug 23 02:53:36 CEST 2003


On Fri, 22 Aug 2003 17:10:40 GMT, "Mike Brown" <mike-nospam at skew.org> wrote:

>"derek / nul" <abuseonly at sgrail.org> wrote in message
>> The print fails (as expected) with a non printing char  '\ufeff'  which is
>of
>> course the BOM.
>> Is there a nice way to strip off the BOM?
>
>"derek / nul" <abuseonly at sgrail.org> wrote:
>> I need a pointer to converting utf-16-le to text
>
>If there is a BOM, then it is not UTF-16LE; it is UTF-16.

This paragraph is from http://www.egenix.com/files/python/unicode-proposal.txt

It explains the difference between utf-16-le and utf-16-be




Standard Codecs:
----------------

Standard codecs should live inside an encodings/ package directory in the
Standard Python Code Library. The __init__.py file of that directory should
include a Codec Lookup compatible search function implementing a lazy module
based codec lookup.

Python should provide a few standard codecs for the most relevant
encodings, e.g. 

  'utf-8':              8-bit variable length encoding
  'utf-16':             16-bit variable length encoding (little/big endian)
  'utf-16-le':          utf-16 but explicitly little endian
  'utf-16-be':          utf-16 but explicitly big endian
  'ascii':              7-bit ASCII codepage
  'iso-8859-1':         ISO 8859-1 (Latin 1) codepage
  'unicode-escape':     See Unicode Constructors for a definition
  'raw-unicode-escape': See Unicode Constructors for a definition
  'native':             Dump of the Internal Format used by Python

Common aliases should also be provided per default, e.g.  'latin-1'
for 'iso-8859-1'.

Note: 'utf-16' should be implemented by using and requiring byte order
marks (BOM) for file input/output.

All other encodings such as the CJK ones to support Asian scripts
should be implemented in separate packages which do not get included
in the core Python distribution and are not a part of this proposal. 





More information about the Python-list mailing list