[Python-checkins] CVS: python/dist/src/Misc unicode.txt,3.7,3.8

Thu, 8 Jun 2000 10:51:35 -0700

Update of /cvsroot/python/python/dist/src/Misc
In directory slayer.i.sourceforge.net:/tmp/cvs-serv8496/Misc

Modified Files:
	unicode.txt 
Log Message:
Marc-Andre Lemburg <mal@lemburg.com>:
Updated to version 1.5. Includes typo fixes by Andrew Kuchling
and a new section on the default encoding.

Index: unicode.txt
===================================================================
RCS file: /cvsroot/python/python/dist/src/Misc/unicode.txt,v
retrieving revision 3.7
retrieving revision 3.8
diff -C2 -r3.7 -r3.8
*** unicode.txt	2000/05/09 19:58:19	3.7
--- unicode.txt	2000/06/08 17:51:33	3.8
***************
*** 20,28 ****
  The latest version of this document is always available at:

!         http://starship.skyport.net/~lemburg/unicode-proposal.txt

  Older versions are available as:

!         http://starship.skyport.net/~lemburg/unicode-proposal-X.X.txt

--- 20,28 ----
  The latest version of this document is always available at:

!         http://starship.python.net/~lemburg/unicode-proposal.txt

  Older versions are available as:

!         http://starship.python.net/~lemburg/unicode-proposal-X.X.txt

***************
*** 102,106 ****
  needed, but if you include Latin-1 characters not defined in ASCII, it
  may well be worthwhile including a hint since people in other
! countries will want to be able to read you source strings too.

--- 102,106 ----
  needed, but if you include Latin-1 characters not defined in ASCII, it
  may well be worthwhile including a hint since people in other
! countries will want to be able to read your source strings too.

***************
*** 170,174 ****

  In containment tests ('a' in u'abc' and u'a' in 'abc') both sides
! should be coerced to Unicode before applying the test. Errors occuring
  during coercion (e.g. None in u'abc') should not be masked.

--- 170,174 ----

  In containment tests ('a' in u'abc' and u'a' in 'abc') both sides
! should be coerced to Unicode before applying the test. Errors occurring
  during coercion (e.g. None in u'abc') should not be masked.

***************
*** 185,189 ****

  All string methods should delegate the call to an equivalent Unicode
! object method call by converting all envolved strings to Unicode and
  then applying the arguments to the Unicode method of the same name,
  e.g.
--- 185,189 ----

  All string methods should delegate the call to an equivalent Unicode
! object method call by converting all involved strings to Unicode and
  then applying the arguments to the Unicode method of the same name,
  e.g.
***************
*** 200,204 ****
  -----------

! UnicodeError is defined in the exceptions module as subclass of
  ValueError. It is available at the C level via PyExc_UnicodeError.
  All exceptions related to Unicode encoding/decoding should be
--- 200,204 ----
  -----------

! UnicodeError is defined in the exceptions module as a subclass of
  ValueError. It is available at the C level via PyExc_UnicodeError.
  All exceptions related to Unicode encoding/decoding should be
***************
*** 269,273 ****

    'utf-8':              8-bit variable length encoding
!   'utf-16':             16-bit variable length encoding (litte/big endian)
    'utf-16-le':          utf-16 but explicitly little endian
    'utf-16-be':          utf-16 but explicitly big endian
--- 269,273 ----

    'utf-8':              8-bit variable length encoding
!   'utf-16':             16-bit variable length encoding (little/big endian)
    'utf-16-le':          utf-16 but explicitly little endian
    'utf-16-be':          utf-16 but explicitly big endian
***************
*** 285,289 ****

  All other encodings such as the CJK ones to support Asian scripts
! should be implemented in seperate packages which do not get included
  in the core Python distribution and are not a part of this proposal.

--- 285,289 ----

  All other encodings such as the CJK ones to support Asian scripts
! should be implemented in separate packages which do not get included
  in the core Python distribution and are not a part of this proposal.

***************
*** 325,329 ****
      def encode(self,input,errors='strict'):

!         """ Encodes the object intput and returns a tuple (output
              object, length consumed).

--- 325,329 ----
      def encode(self,input,errors='strict'):

!         """ Encodes the object input and returns a tuple (output
              object, length consumed).

***************
*** 332,336 ****

              The method may not store state in the Codec instance. Use
!             SteamCodec for codecs which have to keep state in order to
              make encoding/decoding efficient.

--- 332,336 ----

              The method may not store state in the Codec instance. Use
!             StreamCodec for codecs which have to keep state in order to
              make encoding/decoding efficient.

***************
*** 351,355 ****

              The method may not store state in the Codec instance. Use
!             SteamCodec for codecs which have to keep state in order to
              make encoding/decoding efficient.

--- 351,355 ----

              The method may not store state in the Codec instance. Use
!             StreamCodec for codecs which have to keep state in order to
              make encoding/decoding efficient.

***************
*** 491,495 ****
              .readline() method -- there is currently no support for
              line breaking using the codec decoder due to lack of line
!             buffering. Sublcasses should however, if possible, try to
              implement this method using their own knowledge of line
              breaking.
--- 491,495 ----
              .readline() method -- there is currently no support for
              line breaking using the codec decoder due to lack of line
!             buffering. Subclasses should however, if possible, try to
              implement this method using their own knowledge of line
              breaking.
***************
*** 528,532 ****

              Note that no stream repositioning should take place.
!             This method is primarely intended to be able to recover
              from decoding errors.

--- 528,532 ----

              Note that no stream repositioning should take place.
!             This method is primarily intended to be able to recover
              from decoding errors.

***************
*** 554,558 ****
  It is not required by the Unicode implementation to use these base
  classes, only the interfaces must match; this allows writing Codecs as
! extensions types.

  As guideline, large mapping tables should be implemented using static
--- 554,558 ----
  It is not required by the Unicode implementation to use these base
  classes, only the interfaces must match; this allows writing Codecs as
! extension types.

  As guideline, large mapping tables should be implemented using static
***************
*** 629,634 ****

  Support for these is left to user land Codecs and not explicitly
! intergrated into the core. Note that due to the Internal Format being
! implemented, only the area between \uE000 and \uF8FF is useable for
  private encodings.

--- 629,634 ----

  Support for these is left to user land Codecs and not explicitly
! integrated into the core. Note that due to the Internal Format being
! implemented, only the area between \uE000 and \uF8FF is usable for
  private encodings.

***************
*** 650,654 ****

  It is the Codec's responsibility to ensure that the data they pass to
! the Unicode object constructor repects this assumption. The
  constructor does not check the data for Unicode compliance or use of
  surrogates.
--- 650,654 ----

  It is the Codec's responsibility to ensure that the data they pass to
! the Unicode object constructor respects this assumption. The
  constructor does not check the data for Unicode compliance or use of
  surrogates.
***************
*** 657,661 ****
  set of all UTF-16 addressable characters (around 1M characters).

! The Unicode API should provide inteface routines from <PythonUnicode>
  to the compiler's wchar_t which can be 16 or 32 bit depending on the
  compiler/libc/platform being used.
--- 657,661 ----
  set of all UTF-16 addressable characters (around 1M characters).

! The Unicode API should provide interface routines from <PythonUnicode>
  to the compiler's wchar_t which can be 16 or 32 bit depending on the
  compiler/libc/platform being used.